PhD Proposal: Knowledge Localization, Editing, and Unlearning in Foundation Models

Talk
Keivan Rezaei
Time: 
09.04.2025 16:30 to 18:00

As foundation models continue to transform the landscape of artificial intelligence—demonstrating impressive capabilities across vision and language tasks—the need for interpretability and control becomes increasingly critical. My research aims to develop methods for interpreting such models by localizing the knowledge they encode, and leveraging this understanding to enable model editing and machine unlearning. My work spans three major areas: interpretability of vision models, where I propose methods for mapping internal representations to human-understandable concepts and explaining failure modes; knowledge localization and editing in text-to-image generative models, including techniques to identify and modify layers responsible for specific concepts; and machine unlearning in large language models, where I introduce benchmarks and algorithms that improve unlearning efficacy, particularly through the use of intermediate checkpoints. Through these efforts, my research contributes to building more transparent, controllable, and adaptable AI systems.