| Date | Lecture Topics and Materials | Assignments |
|
Introduction: What is data science. Major tools used by data scientists. Class overview. Lecture Notes. |
Lab 0: Basic usage of github, VirtualBox, IPython Notebook (Due 9/12) | |
| Basic Statistics: statistical tests, samples, fallacies.
Lecture Notes.
References:
|
||
| Basic Statistics: linear regression, classification, clustering.
Lecture Notes. |
Lab 1: Python basic stats and plotting (Due 9/19) | |
| Data Models: Overview, Why modeling is essential, Commonly used models (Relational, JSON, Protocol Buffers)
Lecture Notes. |
||
| Relational Databases, SQL
Lecture Notes. |
Lab 2: Basic SQL; Python Pandas and Dataframes; Avro (Due 10/3) | |
| (cntd) | ||
| (cntd) | ||
| Data scraping and wrangling, Unix tools, GUIs
Lecture Notes.
Readings for 828:
|
Lab 3: Advanced SQL and Pandas (Due 10/10) | |
| (cntd) | ||
| Data Integration: Overview, Schema mapping, Entity Resolution
(Lecture Notes Continued)
Readings for 828:
|
Lab 4: Data cleaning using unix tools, Data Wrangler (Due 10/17) | |
| (cntd) | ||
| Information Extraction: Overview, Key Techniques
(Lecture Notes Continued)
Readings for 828:
|
Lab 5: Entity Resolution and Information Extraction (Due 10/28) | |
| Implementation of Relational Databases
Lecture Notes. | ||
| (cntd) | ||
| Distributed programming frameworks: Parallel Databases, MapReduce, Apache Spark, Hadoop Ecosystem Lecture Notes. | Lab 6: Hadoop, Spark (Due: 11/7) | |
| MIDTERM | ||
| (Cntd Distributed Programming Frameworks) | ||
| (cntd) | ||
| (cntd)
Readings for 828:
|
Lab 7: Cassandra and MongoDB (Due: 11/17) | |
| (cntd) | ||
| Key-value stores: Basics, Differences from Relational Databases, Consistency/Replication issues
Lecture Notes. |
Lab 8: Spark Streaming, Storm (Due: 11/26) | |
| (cntd) | ||
| Visualization: D3.js (see Lab 10 for notes) | ||
| Data streaming/Real-time analytics: Data streams in relational databases, Spark Streaming, Storm Lecture Notes. | Lab 9: Neo4j, GraphX (Due: 12/8) | |
| (cntd) | ||
| Graph Databases and Graph Analytics Lecture Notes. | Lab 10: D3 (Due: 12/11) | |
| (cntd) | ||
| Cloud computing: Overview, Virtualization, Data centers, Platform/Infrastrcture-as-a-Service Lecture Notes. | ||
| (cntd) |