PhD Defense: Enhancing Trustworthiness and Safety in Foundation Models

Talk
Yihan Wu
Time: 
11.06.2025 15:00 to 17:00

The rapid deployment of machine learning systems in safety-critical and high-impact applications has amplified the need for models that are both trustworthy and efficient. In this talk, I will present two complementary lines of research toward this goal. First, I will discuss advances in adversarial robustness, focusing on both classification and retrieval models. I will highlight new algorithmic and theoretical insights that improve resilience against adversarial perturbations while preserving accuracy and scalability. Second, I will introduce recent progress on watermarking techniques for large-scale foundation models, which aim to enable reliable attribution and responsible usage of model outputs without degrading their quality. Together, these directions underscore a broader vision of developing principled methods that enhance the safety, reliability, and accountability of modern machine learning systems.