Data, Responsibly: The Next Decade of Data Science
The first decade of data science has largely focused on what *can* be done with large, noisy, heterogeneous, datasets. The next decade of data science will be characterized by what *should* be done: How do we ensure accountability, fairness, and transparency in algorithmic decision-making to combat rather than reinforce inequities? As we apply these approaches in a social context, how do we ensure the privacy of individuals? As these techniques become increasingly democratized, how do we avoid junk science --- spurious, non-reproducible findings? How do we curate and expose existing data to make them "safe" for useful science? In this talk, I'll describe some work underway in this space here at UW and elsewhere, and where we need greater investment from the larger data community. I'll focus on the deep curation project, where our aim is to automatically extract claims from scientific papers and validate them against open data to combat the reproducibility crisis in science.