PhD Defense: Steering Generative AI on the fly: Inference-time Approaches for Safe, Reliable and Inclusive Language Models

Talk
Soumya Suvra Ghosal
Time: 
04.17.2026 15:00 to 17:00
Location: 

IRB-5105 umd.zoom.us/my/dmanocha

Large language models (LLMs) are typically aligned with human values through training-time methods such as reinforcement learning from human feedback. However, these methods produce static policies that cannot adapt to adversarial inputs unseen during training, reasoning challenges that emerge at test time, or the linguistic diversity of low-resource languages. In this thesis, we develop inference-time alignment methods that steer the model's generation during deployment using reward models, without modifying model parameters. We address four complementary challenges: principled decoding for alignment, safety against adversarial attacks, effective test-time reasoning, and inclusive adaptation for low-resource languages.
We first establish theoretical foundations for inference-time alignment. Transfer-Q* resolves the central estimation bottleneck of controlled decoding by leveraging existing aligned models, with formal sub-optimality guarantees and consistent improvements across six evaluation setups. SITAlign extends this to multi-faceted alignment using satisficing theory, maximizing a primary objective while enforcing threshold constraints on secondary criteria, outperforming multi-objective decoding baselines by 22.3% on PKU-SafeRLHF. We then address safety for multimodal models: Immune reformulates jailbreak defense as an inference-time optimization, reducing attack success rates by 30-60% across five models and four benchmarks, and SafeThink recovers safety in reasoning-augmented models by steering only the first 1-3 reasoning steps, reducing attack success rates by up to 45% while preserving reasoning performance.
We next investigate test-time scaling for reasoning models, revealing that extended thinking suffers from diminishing and eventually negative returns due to increasing output variance. We propose “parallel thinking”, which distributes the token budget across independent reasoning paths, achieving up to 22% higher accuracy than sequential scaling, and ThinkRetrieve, which injects retrieved solved exemplars into the reasoning trace at each step, maintaining monotonically increasing accuracy as the thinking budget grows.
Lastly, we address the performance gap for low-resource languages. PromptRefine improves generation quality by up to 2.1x through cross-lingual retriever training with diversity-aware fine-tuning, and RELIC improves reward model reliability by up to 24% through pairwise ranking-aligned retrieval. Together, these contributions demonstrate that alignment can be a dynamic capability exercised at deployment time, rather than a static property instilled during training.
Overall, this thesis develops and evaluates a suite of inference-time methods, spanning controlled decoding, safety steering, retrieval-augmented reasoning, and cross-lingual example selection, that collectively demonstrate that alignment need not be a static property instilled during training but can be a dynamic capability exercised at deployment time.