PhD Defense: Learning the Physical World Structure from Data

Talk
Hadi Alzayer
Time: 
04.13.2026 12:00 to 14:00
Location: 

IRB 5105

Cameras capture rich signals that we often miss. In this thesis, we used our physical priors to explicitly extract information from those hidden signals, and used large-scale generative models to implicitly model the interaction between objects and their surrounding environments.In the first part, we used principles from computational photography to extract information from subtle motion, reflections, and defocus cues. We built systems to magnify subtle motion to make it easier for users to see, used unnoticed reflections to perform non-line-of-sight imaging, and adjusted the details of the scene to recover what was lost to defocus.
In the second part, we used large-scale generative models to directly learn the structure of the world without our physical priors. We trained an image editing model to understand the spatial relationship between objects in a scene and their environments by training on large-scale video data. We then moved from the image space to 3D, by building a large generative model that implicitly captures the material, illumination, and geometry of objects, bypassing traditional graphics pipelines. We then generalized the 3D generative prior beyond lighting to 3D spatial editing and stylization. Finally, we moved from static to dynamic scenes, by training a video model that simulates the interaction between embodied agents and their environments in a general embodiment-agnostic setting.