Systems for Machine Learning (CMSC828G)

Group Project

Overview

In the second half of the course, you will work on a group project in groups of two or three. The idea is to use the systems concepts learned in the course to accelerate and/or parallelize an ML workload. You can select the project topic yourself but you will have to discuss it with the instructor once to ensure that the scope is appropriate.

Group Formation and Project Proposal

The first deadline is forming a group and submitting a project proposal. You should turn in a 1-2 page PDF in the MLSys 2025 style with the following sections.

  • Group members: Names of 2 or 3 group members.
  • Background and Motivation: Describe the systems or performance problem you want to solve.
  • Potential Approach(es): Describe your planned solution(s) to the problem. It's ok to only have rough ideas at this point.
  • Required Compute Estimate: An important part of systems for ML research is planning out your compute budget. Provide a rough estimate of the necessary compute for your project (CPU hours, GPU hours, etc.).

Interim Report

The interim report will update the project proposal with progress made so far. You should have preliminary profiling/performance results in your interim report.

Final Report

At the end of the semester, students will present their projects in class and submit a final report (that builds on the interim report). The presentations will be in the last two weeks and each should be 12 minutes with 3-5 minutes for questions. The final report should be written like a research paper (introduction, background, methodology, results, etc...) in the MLSys format.

Important Dates
Topic Due on*
Group formation and project proposal Mar 10
Interim report Apr 14
Final presentation (in class) April 30-May 7
Final report and code May 11

*All deadlines are at 11:59 PM eastern time.

Project Ideas

An example project could be writing high performance Triton kernels for an existing ML workflow in your research. Another potential topic could be distributing a single GPU workload to multiple GPUs efficiently. Please reach out to the instructor if you need help coming up with project ideas.

Titles of some projects from previous semesters:

  • Explore Lossless Compression for Communication in LLM Training
  • Scaling Graph Neural Network Training Across Multiple GPUs
  • Developing Efficient WebGPU Compute Shaders to Accelerate ML Workloads
  • Efficient Implementation of LLMs on Handheld Edge Devices
  • Profiling and Optimizing KV Caching and Prefix Caching in LLM Inference
  • Efficient LLM Inference on Heterogeneous GPUs
  • Accelerating Distributed LLM Inference on FPGAs
  • Profiling and Optimizing CycleGAN