PhD Defense: Toward Adaptive and Efficient Visual Synthesis of Appearance, Dynamics and Semantics

Talk
Yixuan Ren
Time: 
04.14.2026 12:15 to 14:00

Generative modeling for visual content has rapidly evolved, enabling the synthesis of novel images and videos as well as the transformation of existing assets. In recent years, diffusion models have played a central role in advancing these capabilities. While training large-scale models on extensive data has led to remarkable quality and surprising generalization, their limited controllability and high complexity remain significant challenges. In this thesis, we investigate adaptive and efficient methods for a range of image editing and video generation tasks.
We begin by studying content-adaptive image color editing, where auxiliary color restoration tasks are introduced to capture users’ chromatic preferences. We then propose the one-shot video motion customization task, which adapts pre-trained text-to-video diffusion models on a single reference video to synthesize the reference motion with novel subjects and scenes. By analyzing spatiotemporal disentanglement along denoising timesteps, we further show that motion customization can be simplified to require no additional components or training procedures. Beyond diffusion in conventional latent spaces, we explore implicit video tokenization and diffusion models based on implicit neural representations, which holistically synthesize videos by generating neural weights and yield compact representations and efficient generation. Finally, we present a noise-free flow-matching framework that directly evolves source image latent toward target image latent for precise and efficient instructional image editing, suggesting a path toward unification across generative paradigms.