Interpretability as the Inverse Machine Learning Pipeline

Talk

Sarah Wiegreffe

Talk Series:

Dept. Colloquium

Time:

11.14.2025 11:00 to 12:00

Location:

IRB 0318 (Gannon) or https://umd.zoom.us/j/93754397716?pwd=GuzthRJybpRS8HOidKRoXWcFV7sC4c.1

URL:

https://talks.cs.umd.edu/talks/4415

Language models (LMs) power a rapidly-growing and increasingly impactful suite of AI technologies. However, due to their scale and complexity, we lack a fundamental scientific understanding of much of LLMs’ behavior, even when they are open source. In this talk, I will describe some of our recent work on interpreting LMs through the lens of the classical machine learning pipeline. This includes 1) working backwards from behavioral analysis and explanation generation as a form of model evaluation, 2) interpreting model internals post-training, 3) understanding model training dynamics, and ultimately 4) attributing model behavior back to the training data, with the goal to build better training corpora for future LMs.

Interpretability as the Inverse Machine Learning Pipeline

Talk

Talk

Talk

Talk

Event

Event

Event

Talk

Talk

Talk