Hi! I'm a 3rd-year CS Ph.D at University of Maryland, College Park, working with Abhinav Shrivastava and Yaser Yacoob.I have broad interests in vision and language tasks, including image/video captioning, multimodal semantic alignment, fact-checking, document understanding. My recent focus is on building customizable large models that follow humans' intent.