Systems for Machine Learning (CMSC828G)

Lecture Paper Readings

There will be two assigned readings for each lecture (starting on Feb 19). All students are supposed to read all the assigned papers before the lecture. Your class participation grade for the course will be determined by:

  • Questions about the reading: Every student will submit 2-3 questions or discussion topics on one of the readings (If your last name starts with A-M: Reading 1; if your last name starts with N-Z: Reading 2). These will be due 9 AM on the date of the lecture. Name the PDF that you will submit on gradescope as follows: MMDD-LastName-FirstName.pdf where MM and DD are the month and date of the lecture.
  • Short presentation on the reading: Once in the semester, each student (paired with another student) will do a short 5-minute presentation on one of the assigned readings. You will upload a PDF of the presentation by 9 AM on the date of the lecture. Name the PDF that you will submit on gradescope as follows: MMDD-LastName-FirstName.pdf where MM and DD are the month and date of the lecture. These presentations should be 4-5 minutes (in total including both students' parts) and follow the provided format (PDF, PPTX).
Resource on how to read a scientific paper.

Lecture Slides

AWQ 2024
No. Date Topic and Slides Reading 1 Presenters Reading 2 Presenters Additional Reading
Jan 27 Snow Day! (Class canceled)
1 Jan 29 Course Overview [video]
2 Feb 3 HPC: Introduction and Collectives MPICH 2003 NCCLX 2025
Feb 5 No Class
Feb 10 HPC: Introduction and Collectives (contd.)
3 Feb 12 Programming in Triton Triton 2019 Release Blog 2021
Feb 17 Programming in Triton (contd.)
4 Feb 19 Deep Learning and Transformers Attention 2017 GPT 2018
5 Feb 24 Performance Challenges and Modeling COTS HPC 2013 Extra-Deep 2023
6 Feb 26 Data and Tensor Parallel Training PyTorch FSDP 2023 DC, AC AxoNN 2024 YGa, YGe
Mar 3 Midterm Exam 1 (during class)
7 Mar 5 Pipeline and Hybrid Parallel Training GPipe 2018 EG, PHo Megatron-LM 2021 Guest - Mohammad Shoeybi [video]
8 Mar 10 Sparsity in Training MegaBlocks 2022 CA, YH X-MoE 2025 Guest - Sajal Dash [video] MoE 2017
Sputnik 2020
9 Mar 12 Optimizing GPU Kernels Flash Attention 2022 PHu, CL Auto-tuning 2024 Guest - P. Sadayappan [video]
Mar 17 Spring Break
Mar 19 Spring Break
10 Mar 24 Introduction to Inference vLLM 2023 JL, KR ORCA 2022 CS, SX Transformers 2022
Sarathi 2023
11 Mar 26 Memory offload ZeRO-Infinity 2021 JY, JZ FlexGen 2023 InfiniGen 2024
12 Mar 31 Approximating Attention H2O 2023 ShZ, SuZ Mamba 2023 Guest - Albert Gu [video] Top-k 2021
DeepSeek-V2 2024
13 Apr 2 Long context optimizations StreamingLLM 2023 JJ, AM RingAttention 2023 Blockwise T. 2023
14 Apr 7 Quantization QLoRA 2023 SS, AS GPTQ 2022 LLM.int8() 2022
Apr 9 Midterm Exam 2 (during class)
Apr 14 Project Work
15 Apr 16 Optimizing Data Movement Mooncake 2024 YZ, JX PCCL 2025 Guest - Siddharth Singh DataStates-LLM 2024
16 Apr 21 Compound AI Systems SpecDec 2022 KA, YFC HPC-R1 2025 Smurfs 2025
17 Apr 23 LLMs for Code SWE-agent 2024 AH, MK HPC Perf 2026 Guest - Daniel Nichols
18 Apr 28 Other DL Approaches Plexus 2025 AK, AM AI/Systems 2025 Guest - Audrey Cheng DistDGL 2020
Apr 30 Project Presentations
May 5 Project Presentations
May 7 Project Presentations
May 11 Final Project Due