Privacy-first Data Management Systems
Responsible data stewardship has largely been an afterthought over the last two decades as new techniques and tools were being rapidly developed to harness the potential of "big data". However, increased scrutiny by regulatory bodies (resulting in regulations like GDPR, CCPA, etc.), high-profile data breaches, and wide-spread use of data-driven processes to make life-altering decisions, have brought to the forefront the issues of transparency, trust, and responsible and ethical usage of data. In this talk, I will discuss of some of the new data management challenges that have emerged as a result of these developments, focusing in particular on the need to build novel privacy-first database systems to operationalize privacy-by-design principles. I will discuss some preliminary work that uses pseudonymization and synthetic data generation to transparently rearchitect a relational database system to achieve a variety of privacy goals, and research challenges moving forward. I will then briefly discuss other recent research projects in my group, including our prior and ongoing work on graph databases, and on building a unified provenance and metadata management system to support data science lifecycle management.