Algorithms for Data Science CMSC 644

Class Time: Wed 7:00pm--9:30pm. CSI 3120.

Overview: As large amounts of data are being created it is important to understand how to analyse the data to extract interesting trends and patterns. Since the volume of data is large, it may not be feasible to make more than a single pass over the data. Stream processing methods provide effective ways to extract useful information from large data sets by making very few passes on the data. Surprisingly, a lot of information can be gleaned by making a single pass over the data, or a small number of passes over the data. The first part of the course will cover random sampling and stream processing methods. We will also consider privacy issues in data bases and how these should be handled.

Prerequisites: CMSC 351. I expect familiarity with basic algorithms. This course is an algorithmic oriented course, with proofs of correctness etc.

I will update this page every week during the semester. I will place all homeworks as well as solutions to homeworks here. If you have any trouble accessing them, please let me know.

Readings

Main Textbook:

Useful Readings:

Homeworks

To be added

Schedule

Here is the schudule of some previous term. This gives some ideas about the course, but the material for this term will NOT be the same!

  • Lecture 1 (Jan 29): Overview, data analysis, algorithms, Sampling
  • Lecture 2 (Feb 5): Bonferroni Principle, Sampling, Hash function [Chapter 4]
  • Lecture 3 (Feb 12): Streaming, Frequency Estimation, Distinct Element Estimation, Finding Frequent Elements [Chapter 4]
  • Lecture 4 (Feb 19): K-center, and guest speaker Prof. David Mount covered Coresets. [Chapter 5]
  • Lecture 5 (Feb 26): Linear Programming, Incremental K-center
  • Lecture 6 (March 5): Guest lecture by Dr. Jessica on Linear Programming and Gurobi. Files to be used
  • Lecture 7 (March 12): K-shingles, min-hash [Chapter 3]
  • Lecture 8 (March 26): LSH, Link Analysis [Chapter 3, 5]
  • Lecture 9 (April 2): Information Visualization (Ben Shneiderman), midterm
  • Lecture 10 (April 9): Social Network Analysis [Chapter 10?]
  • Lecture 11 (April 16): Guest lecture by Dr. Jessica, topic TBD
  • Lecture 12 (April 23): TBD
  • Lecture 13 (April 30): TBD
  • Lecture 14 (May 7): Final exam