I am an AI safety researcher focused on reasoning, scalable oversight, alignment, self-improvement, and AI control. My work addresses critical safety challenges, including identifying and mitigating reward hacking in LRMs, safe process-supervision, weak-to-strong generalization using multi-LLMs, and using interpretability to reduce hallucinations. I also have a background in agentic world modeling, MARL (multi-agent Reinforcement Learning), and robustness, with deep expertise across LLMs, Diffusion-LLMs, and VLMs. I am highly motivated to conduct research that addresses agentic misalignment, improves capability and reduces catastrophic risks from frontier models.