Data science illustration

Impact of Data Science: An Interview with Professor H. V. Jagadish

Onawa Gardiner, Marketing Specialist
@onawanna

On May 1, Professor H.V. Jagadish, in collaboration with DEI, will launch the Data Science Ethics MOOC on edX. This MOOC serves to cultivate guidelines for ethical practices that are broadly applicable across the data science field. Professor Jagadish, as an expert on data science, has contributed to publications such as Slate Magazine, U.S. News and the Conversation U.S. In addition, he has recently received a grant from the Bill and Melinda Gates Foundation to leverage data to promote social good through the Foundation’s Grand Challenge Explorations. He also discusses key issues regarding ethics in data science with his blog, Big Data Dialog.

We sat down with Professor Jagadish to discuss the importance of ethics in this growing field and the impetus for creating a MOOC that provides a structure for data science ethics.

Data science illustration

What motivated you to design a succinct, educational module for data science ethics and how do you see this course being used in the future?

Data Science is having a huge impact on our society today. While the progress in Data Science is mostly in terms of better technology—better systems and better algorithms—it is not true that Data Science is all about technology. Rather, as sentient beings in a rapidly changing world, we need to be aware of where technology begins and where it ends. As a technologist myself, I have seen far too many techies who confine themselves into a narrow technology box and refuse to see the real impact their work is having on people. My goal is to enrich the education of every Data Scientist so that they can see the impact they are having on the world, own this impact, and ensure that this impact is in line with what they would like to see happen.

Your blog, Big Data Dialog, focuses on validity, privacy and fairness as the three pillars for data science ethics. What was the reasoning for selecting these three and how do they, together, form the basis for ethics in data science?

Ethics is a very broad topic, and there is a great deal of deep philosophical thinking about ethics. My goal, in my course, is to package something that summarizes and encapsulates a few key concepts that provide the typical Data Scientist with the tools to reason through most situations they are likely to face. While I do not have the same constraints in my blog, I do feel strongly that a blog has the greatest impact if it is organized by topic. I have chosen three major themes in Data Science Ethics as the initial organizing structure for my blog; validity, privacy and fairness.

How has working with DEI and edX on creating a MOOC brought to light new aspects or resulted in you reevaluating certain aspects of ethics in data science?

The DEI team has been fantastic to work with. They have asked me many penetrating questions that has resulted in my thinking hard about the objectives of this MOOC and the best way for me to present the material I have in mind. I believe that this MOOC is very much improved due to their penetrating questions and creative input.

How have recent news surrounding algorithmic bias confirmed the fallacy of the notion that data speaks for itself? How can ethical guidelines address this?

Yes, algorithmic bias has indeed been in the news recently, and is a great example that illustrates why it is not enough for Data Scientists merely to manage and analyze a given data set, but rather to understand what the goals of the analysis are and to own the impact of their work.

What are some ways you have observed the data science field evolve and how has it changed to necessitate the formation of structured ethical guidelines?

We all make thousands of decisions every day, which collectively have an impact on us as an economy, as a society and as a planet. These decisions could be as banal as what time to wake up in the morning, what brand of toothpaste to use and what route to take to work. For most of us, for most of these decisions, understanding the broader impacts is too much: we just want to decide based on our local preferences and move on. But, in the aggregate, it does matter what decisions each of us makes individually. Technological progress is similar. Creative minds devise new ways of doing something. Just finding the new way is progress enough: it is unfair and progress-impeding to ask them to address hard to identify consequences. However, as a technology matures, and particularly as it begins to have the kind of pervasive social impact that Data Science does today, it is no longer possible to ignore the societal effects. Ethical guidelines then become crucial.

As we continue to see more emphasis on data science, how do you anticipate the role of ethical guidelines will change and grow in the future?

The greater the impact of Data Science on our society, and the greater the number of different spheres in which this impact occurs, the greater will be the importance of considering ethics.

Enrollment is open for Data Science Ethics, which will launch on May 1. To learn more and/or to enroll visit Data Science Ethics.

 

H.V. Jagadish
Professor of Electrical Engineering and Computer Science
College of Engineering
University of Michigan