PhD Proposal: Towards Reliable Reasoning and Alignment in Large Models

Talk
Aakriti Agrawal
Time: 
12.17.2025 11:00 to 12:00

Large Language Models (LLMs) have achieved remarkable progress in reasoning and alignment, largely driven by post-training methods such as reinforcement learning and reward-model–based supervision. However, such guidance is often imperfect—or entirely unavailable—limiting the reliability of these large models. This thesis aims at improving these large models and making them more reliable both with and without external guidance.
First, we study process supervision used primarily for Large Reasoning Models (LRMs) and show that Process Reward Models (PRMs) often overcredit incorrect steps, leading to false positives and policy misalignment. We provide a formal analysis of this issue and introduce an Overcredit Contrastive (OC) loss, which penalizes false positives to produce more precise step-level signals and better-aligned reasoning policies.
Next, we investigate verifier-free improvements during both inference and training time. (i) In Uncertainty-Aware Answer Selection, we calibrate token log-likelihoods across diverse LLMs to identify the most reliable response, consistently outperforming majority voting and single-model self-consistency at comparable cost. Our method relies solely on model confidence for its improvement. (ii) In EnsemW2S, we propose a token-level, weak-to-strong (W2S) ensembling approach that iteratively corrects the weaknesses of smaller experts using a small set of labeled examples and then uses the refined ensemble to supervise stronger students, improving generalization to out-of-distribution and high-difficulty reasoning tasks.
Finally, we extend our analysis to vision–language models (VLMs) and show how improving inter-modal alignment reduces hallucinations by rebalancing attention toward visual evidence, significantly mitigates misalignment and strengthens factual consistency.