Sarah Wiegreffe on Understanding AI Systems

She discusses her path into computer science, research on large language models and advice for students entering the field.

February 27, 2026

Sarah Wiegreffe, an assistant professor of computer science at the University of Maryland, studies how deep learning systems for language operate and how their behavior can be interpreted. Her research focuses on interpretability in machine learning, particularly large language models, and examines how these systems can be made more transparent, controllable and reliable.

Wiegreffe earned her Ph.D. in computer science from the Georgia Institute of Technology in 2022. She held internships at Google and the Allen Institute for Artificial Intelligence, where she was later recognized as an outstanding intern, and from 2022 to 2025 completed a postdoctoral appointment at the same institute.

In a recent Q&A, Wiegreffe discussed her career trajectory, current research and perspectives on the evolving role of artificial intelligence.

Was there a defining moment that shaped your career path into computer science?

My interests started in high school, where I was drawn to linguistics and anthropology, but I also really enjoyed math and statistics. I liked the objectivity and structure of mathematics, and data science felt like a way to apply math to real-world problems. I majored in data science as an undergraduate when the field was still emerging.

The major combined statistics and computer science, and I remember feeling intimidated going into my first programming class because I wasn’t sure whether I would enjoy it. But I ended up doing well, and my professor offered me an internship afterward. That experience made me realize programming was something I wanted to pursue. Combining language and machine learning eventually led me into natural language processing.

Can you tell me about your research focus now and what drew you to this field?

My research focuses on interpretability in deep learning systems for language, which today largely means interpreting large language models. As these systems have grown more complex and widely used, understanding how they work has become increasingly important.

I initially became interested in interpretability while working on machine-learning applications in healthcare settings. When models are used by clinicians, explanations need to reflect the model’s actual internal reasoning rather than creating misleading confidence. That raised broader questions about how neural network systems function and how little we sometimes understand about their internal processes. The field often develops new models faster than it can fully explain them, and interpretability research tries to close that gap.

What are you currently working on, and what interests you most about these projects?

There are two main themes in my group’s work. One involves using interpretability to improve efficiency and customization in neural networks. We study how to modify model behavior at inference time, a process known as steering, so systems can adapt to different users or applications. Increasingly, people want personalized models rather than one-size-fits-all solutions, and that requires reliable ways to adjust model responses on the fly.

The second theme focuses on model-generated explanations, such as chain-of-thought reasoning. These explanations are often presented as evidence of how a model reached an answer, but there are open questions about whether they accurately represent the model’s reasoning. We study the robustness of these explanations and whether they can help detect unsafe behavior or users’ attempts to manipulate models (jailbreaking).

How does your work connect with the broader computer science community and society?

The goal is to develop technical methods that give people more agency when interacting with AI systems. That includes enabling personalization and helping users better understand how models generate outputs. Improving interpretability can also support AI literacy as language models become more widely used.

Academic researchers also play an important role in studying the scientific foundations of language models, especially as many commercial systems are proprietary. Academic work helps advance an open understanding of how these systems function.

What inspired you to join the University of Maryland, and what have you enjoyed most so far?

I visited during a Rising Stars workshop and was drawn to the department’s collaborative environment and breadth of research areas. Maryland has strong groups in natural language processing, machine learning, computer vision, and human-computer interaction, creating opportunities for interdisciplinary work.

Also, working with students has been a highlight. Many are interested not only in technical questions but also in societal issues such as safety and reliability, which align closely with my research interests.

What advice would you give to students interested in your line of research?

It’s important to stay grounded in why you are interested in the field and the kind of impact you want your work to have. AI research currently moves very quickly, and keeping that perspective helps filter out distractions and focus on meaningful problems.

—Story by Samuel Malede Zewdu, CS Communications

The Department welcomes comments, suggestions and corrections. Send email to editor [-at-] cs [dot] umd [dot] edu.