PhD Proposal: Multi-Exit Design Across the Language Model Lifecycle: Toward Efficient, Robust, and Adaptive Models

Talk
Kamala Varma
Time: 
05.20.2025 10:00 to 11:30
Location: 

As large language models (LLMs) become more powerful and widely adopted, the costs of training and operating them are growing alarmingly. A promising direction is to build multi-exit models (MEMs), originally introduced in the image domain to reduce inference costs by allowing samples to exit the model early. However, the full potential of such input-adaptive mechanisms is not well understood. In particular, MEMs are vulnerable to slowdown attacks that delay exits and can thereby negate their computational savings, which is an emerging threat model that lacks defenses and that is under-explored in the language domain. Moreover, the potential for MEMs to benefit the training phase remains untapped. My research addresses both gaps by investigating MEMs in settings with greater computational demands than image-based inference: the language domain and the training process.
First, I uncover various vulnerabilities specific to language-based MEMs and analyze their underlying causes. I also propose a lightweight defense to improve robustness against slowdown attacks, which operate by delaying sample exits in a MEM. Second, I introduce the multi-exit mechanism into the training process, making training input-adaptive in a way that most existing efficiency techniques do not. I begin with Federated Learning with input-Adaptive Multi-Exiting (FLAME), wherein the collaborative nature of FL naturally mitigates the under-optimization of later-layer parameters often caused by multi-exit training. This design is especially well-suited to FL clients, which are often resource-constrained and stand to benefit from the computational savings it provides.
These results open new research avenues that leverage the multi-exiting idea. Building on insights from FLAME, I propose adapting multi-exit training to centralized settings with Centralized Multi-Exit Training (CoMET). CoMET will involve periodically adjusting exit criteria to emulate the collaborative benefits of FL, integrating curriculum learning strategies to enhance accuracy, and designing an adapted MEM architecture that makes multi-exiting more practical for generative models in terms of time and space complexity. I also plan to explore the privacy and security implications of training-time early-exiting. Overall, my research aims to lay the groundwork for more efficient, robust, and adaptive LLMs.