Eric Joyce, Marketing Specialist
@ericmjoyce
The emergence of “big data” has stimulated new opportunities to better understand patterns, trends and associations in human behavior and interaction for individuals and organizations across all industries. These data have also uncovered new privacy concerns requiring an ethical framework for data scientists and other “big data” aggregators.
To help address these concerns, Professor H.V. Jagadish launched his Data Science Ethics MOOC on edX earlier this year examining the complexity of ethics, data ownership and privacy in data science. The course is designed for novices in the field and established data scientists alike. More than 4,500 lifelong learners enrolled in “Data Science Ethics” last May and Professor Jagadish has re-launched the course this fall. Learners explore ethical issues regarding who owns data, ways in which informed consent is granted and how different aspects of privacy are valued throughout the class. Course materials are designed for lifelong learners to explore independently and are published as open educational resources so that they may be used by other educators to supplement their curriculum.
As a leading expert in the field of data science, Professor Jagadish has contributed to Slate Magazine, U.S. News and World Report, the Conversation U.S. and authors his own blog, Big Data Dialog, where he discusses key issues surrounding ethics in data science. He is also a grant recipient from the Bill and Melinda Gates Foundation’s Grand Challenge Explorations family of initiatives designed to stimulate innovation to solve key global health and development problems.
Earlier this year, we discussed the impact of data science and the necessity to establish ethical guidelines in the field with Professor Jagadish. We followed up with him to discuss his experiences during the initial launch of the course and how he incorporates real-world case studies examining emerging ethical dilemmas from a data science perspective.
What questions have emerged in the field of data science and/or surrounding data ethics since first launching the course?
The crucial importance of data science ethics has grown tremendously even within the few months since the course was launched.
The White House put out a report, “Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights,” laying out a U.S. national perspective on Data Science Ethics, and underlining the importance of training such as this MOOC offers.
Cathy O’Neil wrote an excellent, if alarming, book: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. In this book she points out the many ways in which Big Data can hurt the poor, minorities and the underprivileged. I believe that Data Scientists trained to think ethically are the primary solution to the problems she so eloquently points out.
Tell us about the different case studies you incorporate in the course. How do these illustrate the real-world impact of data science ethics?
Data Science has tremendous impacts on society, and we are still understanding these as a society. So there is an interesting news story almost every week. I chose a few of the most interesting stories to make into case studies for the original launch of this course in May. I now have material for 2 or 3 more case studies since then. However, I am resisting the temptation to tinker with the course material too much. So I have not added these new case studies into the course in a full-blown way. Rather, I am planning to introduce them lightly – in discussion.
One other thing we are doing differently in this re-launch is to make the course self-paced, with all the material online on day 1. This gives the students much more flexibility with their timing, at the cost of some dispersion in the discussion threads.
What common ethical concerns may faculty, students, alumni and/or the public encounter from a data science perspective and what can they do to address them?
Even as society as a whole is becoming more aware and more concerned about possible problems that can arise due to the inappropriate use of data science, we still do not have basic data science education as part of the data science curriculum. We need this. We need every practitioner of data science to understand the potential impact of their work on society, to take responsibility for it, and to have enough understanding of the critical issues to be able to exercise good judgment.
In what ways can students apply what they learn in this course to make informed data science decisions or help organizations manage data responsibly and ethically?
In the practice of data science, a data scientist makes so many decisions every day: what variables to include in their model, what training data to use, how to resolve errors and missing values in data, what types of explanations to obtain from algorithms, and on and on. Many of these decisions can have significant downstream impact. The teachings of this course will help the data scientist to recognize these possible impacts and to take them into consideration as they make these every day decisions. The end result is a win-win: better decisions get made, with clear appreciation of societal impacts.
Of course, there will be some situations where we cannot get to a win-win. There may be a conflict between a desired ethical path and other pressing requirements. These will need to be worked out. This course will at least help frame the issues, so that there can be a meaningful discussion that leads to a well-considered decision.
Professor Jagadish’s Data Science Ethics course is available now on edX. Visit the Data Science Ethics page to enroll in this four-week course.
H.V. Jagadish
Professor of Electrical Engineering and Computer Science
College of Engineering
University of Michigan