From Pixels to Systems: How Generative and Agentic Techniques Reshape Computational Imaging, and Beyond
IRB 4105
Large diffusion models have revolutionized text-to-image generation, opening up vast opportunities for conditional tasks such as image editing, restoration, and content creation. Yet, many computational imaging problems, especially those requiring real-time, on-device execution, face challenges in adopting such large-scale generative models. In this talk, I will trace our efforts to bridge this gap. I will begin with efficient architectures for image enhancement (MAXIM, CVPR 2022), and then discuss our recent advances in leveraging pre-trained diffusion models for versatile image restoration. This includes conditional diffusion distillation (CoDi, CVPR 2024) for high-fidelity, accelerated generation, and language-guided control of pre-trained diffusion models for restoration (SPIRE, ECCV 2024). I will also share our award-winning solution for the NTIRE 2025 short-form UGC video enhancement challenge hosted by Kwai. Building on these foundations, I will present our latest work, 4KAgent (Preprint), which demonstrates how the reasoning and planning capabilities of large language models can orchestrate multiple expert restoration tools to achieve highly realistic, universal 4K upscaling for any image type. This represents a step toward a broader paradigm for agentic computer vision, which shifts the focus from isolated model development to system-level intelligence, enabling vision systems that reason, plan, and act in concert to solve more generic and practical problems in the real world.