PhD Proposal: Taming Generative Priors for Consistent and Faithful Video Processing

Yiran Xu
05.16.2024 15:00 to 16:00

IRB IRB-4105

Generative models have shown promising capability in image and video synthesis. A pre-trained generative model, e.g., a Generative Adversarial Network (GAN), shows powerful generative prior on downstream tasks, such as image editing and Image Super-Resolution (ISR). However, they have difficulties when one directly applies them to videos, introducing temporal inconsistency or flickering. It is also challenging when it comes to the out-of-domain (OOD) data. In this proposal, we explore to use pre-trained image generative models for their video tasks. We start with video semantic editing task, and propose a flow-based approach to gain the temporal consistency. In addition, to enhance the model’s editability on the OOD data, we then propose to decompose the in-distribution component and out-of-distribution component by leveraging a pre-trained 3D GAN. However, the GANs are typically limited to a specific category, e.g., human faces or animals. To target at more generic scenarios, we present a large-scale Video Super-Resolution (VSR) model, VideoGigaGAN, that produces detail-rich and temporally stable output for generic data, by adapting its image counterpart to videos. We also propose to explore generic obstruction removal and to analyze the high-frequency flickering in diffusion models as a future work.