Algorithms for Data Science (CMSC 644/498U)

Class Time: Mon 7:00pm--9:30pm. CSI 3120.

Overview: As large amounts of data are being created it is important to understand how to analyse the data to extract interesting trends and patterns. Since the volume of data is large, it may not be feasible to make more than a single pass over the data. Stream processing methods provide effective ways to extract useful information from large data sets by making very few passes on the data. Surprisingly, a lot of information can be gleaned by making a single pass over the data, or a small number of passes over the data. The first part of the course will cover random sampling and stream processing methods. We will also consider privacy issues in data bases and how these should be handled.

Course Work: Course work will consist of homeworks and two exams. The relative weights of these will be 30% for the homeworks, 30% for the midterm and 40% for the final exam.

Prerequisites: CMSC 351. I expect familiarity with basic algorithms. This course is an algorithmic oriented course, with proofs of correctness etc.

Instructor: Samir Khuller

Office: AVW 3369. Office phone: (301) 405--6765.
samir@cs.umd.edu

Office Hours: Mon 5:30pm-6:45pm. If you cannot make these hours, please make an appointment to see me at a different time.

Teaching Assistant: Sheng Yang

Office: AVW 3457.
styang@cs.umd.edu

Office Hours: TuTh 4:00pm-5:00pm. If you cannot make these hours, please make an appointment to see me at a different time.

I will update this page every week during the semester. I will place all homeworks as well as solutions to homeworks here. If you have any trouble accessing them, please let me know.

Readings

Main Textbook:

Useful Readings:

Homeworks

  • Homework 1: link. Input files. Due date: 2/14, 5 p.m.
  • Homework 2: link. Input files. Due date: 03/01, 5p.m.
  • Homework 3: link. Due date: 03/27, 5p.m.
  • Homework 4 (the same as midterm, will take the maximum of midterm and homework 4 as score): link. Due date: 04/17, 5p.m.
  • Homework 5: link. Input files. Due date: 05/02, 5p.m.
  • Schedule

    Previous Schedule

    Here is the schudule of some previous term. This gives some ideas about the course, but the material for this term will NOT be the same!