CMSC828E: Privacy and Ethics in Data Management Systems

Prof. Amol Deshpande;    Online;    Tue-Thu 3:30pm-4:45pm


[Home] [Schedule] [Assignments] [Resources]


Description:

Huge volumes of personal, sensitive data are routinely collected, processed, and analyzed in today's world, by household companies like Google, Amazon, Facebook, etc., to relatively unknown data brokers and other entities. There is increasing awareness of both the extent of the data being harvested, and real potential harms to people because of how that data is being used. As a result, data privacy and responsible and ethical data stewardship issues are rapidly growing in importance, leading to a slew of regulations like European General Data Protection Regulation (GDPR), California's CCPA, Brazil's LGPD, etc., that all require organizations to be more transparent about their data collection and usage practices, and provide users with more control over their data.

Engineering privacy and ethics into the data processing pipelines raises many novel and difficult challenges, spanning data management systems, data processing frameworks, machine learning, security, HCI, programming languages, software engineering, to name a few.

The goals of this course are to explore techniques and approaches to building new platforms, especially data management systems and big data frameworks, to support the goals of privacy, ethics, transparency and fairness.

We will discuss both foundational techniques like differential privacy and trusted computing, as well as systems research on building practical and usable systems.

Some of the topics we intend to cover include:

  • How privacy and ethics are being regulated and are expected to be regulated
  • Anonymization/pseudonymization of data, de-identification attacks against anonymized data
  • Differential Privacy
  • Encrypted databases and related techniques
  • Techniques for incorporate fairness and transparency into data processing pipelines
  • Access control, Data provenance
  • Data flows in adtech/marketing world
  • Recent work on engineering privacy into data management frameworks
  • Recent work in programming languages, data storage systems, in these topics
A tentative reading list will be posted soon.

Approach:

This is a research-oriented seminar course, and will be based on reading, and discussing papers from recent conferences.

The course counts as a PhD and MS qualifying course in Databases.

Class forum:

We will use Slack for class communications and discussions: Link to Join.

Course Grading:

The grading will be based on class participation + paper summaries (20%), assignments (30%), take-home final (20%), and a class project (30%). More details on the assignments tab.