Systems for Machine Learning (CMSC828G)

Lecture Paper Readings

There will be two assigned readings for each lecture (starting on Feb XX). All students are supposed to read all the assigned papers before the lecture. Your class participation grade for the course will be determined by:

  • Questions about the reading: Every student will submit 2-3 questions or discussion topics on one of the readings (If your last name starts with A-M: Reading 1; if your last name starts with N-Z: Reading 2). These will be due 9 AM on the date of the lecture. Name the PDF that you will submit on gradescope as follows: MMDD-LastName-FirstName.pdf where MM and DD are the month and date of the lecture.
  • Short presentation on the reading: Once in the semester, each student (paired with another student) will do a short 5-minute presentation on one of the assigned readings. You will upload a PDF of the presentation by 9 AM on the date of the lecture. Name the PDF that you will submit on gradescope as follows: MMDD-LastName-FirstName.pdf where MM and DD are the month and date of the lecture. These presentations should be 4-5 minutes (in total including both students' parts) and follow the provided format (PDF, PPTX).
Resource on how to read a scientific paper.

Lecture Slides

No. Date Topic and Slides Reading 1 Presenters Reading 2 Presenters Additional Reading
Jan 27 Snow Day! (Class canceled)
1 Jan 29 Course Overview [video]
2 Feb 3 HPC: Introduction and Collectives MPICH 2003 NCCLX 2025 PCCL 2025
Feb 5 No Class
Feb 10 HPC: Introduction and Collectives (contd.)
3 Feb 12 Programming in Triton Triton 2019 Release Blog 2021
Feb 17 Programming in Triton (contd.)
4 Feb 19 Deep Learning and Transformers Attention 2017 GPT 2018
5 Feb 24 Performance Challenges and Modeling COTS HPC 2013 Extra-Deep 2023
6 Feb 26 Data and Tensor Parallel Training PyTorch FSDP 2023 DC, AC AxoNN 2024 YGa, YGe
Mar 3 Midterm Exam 1 (during class)
7 Mar 5 Pipeline and Hybrid Parallel Training GPipe 2018 EG, PHo Megatron-LM 2021 Guest - Mohammad Shoeybi [video]
8 Mar 10 Sparsity in Training MegaBlocks 2022 CA, YH X-MoE 2025 Guest - Sajal Dash [video] MoE 2017
Sputnik 2020
9 Mar 12 Optimizing GPU Kernels Flash Attention 2022 PHu, CL Auto-tuning 2024 Guest - P. Sadayappan [video]
Mar 17 Spring Break
Mar 19 Spring Break
10 Mar 24 Introduction to Inference vLLM 2023 JL, KR ORCA 2022 CS, SX Transformers 2022
Sarathi 2023
11 Mar 26 Memory offload ZeRO-Infinity 2021 JY, JZ FlexGen 2023 InfiniGen 2024
12 Mar 31 Approximating Attention H2O 2023 ShZ, SuZ Mamba 2023 Guest - Albert Gu [video] Top-k 2021
DeepSeek-V2 2024
13 Apr 2 Long context optimizations RingAttention 2023
14 Apr 7 Quantization LLM.int8() 2022 AWQ 2024 GPTQ 2022
Apr 9 Midterm Exam 2 (during class)
15 Apr 14 Optimizing Data Movement PCCL 2025 DataStates-LLM 2024 Guest -
16 Apr 16 Agentic Systems
17 Apr 21 AI Coding Assistants
18 Apr 23
19 Apr 28 Specific DL Models CAGNET 2020 DLRM 2020 Guest
Apr 30 Project Presentations
May 5 Project Presentations
May 7 Project Presentations
May 11 Final Project Due