Unlocking Optimization with Generative AI: Reward-Directed Conditional Diffusion

Talk
Mengdi Wang
Princeton University
Time: 
11.08.2023 12:30 to 13:30
Location: 

IRB 4105

In this talk, we delve into the world of generative AI, focusing on reward-directed generation through conditional diffusion models—a powerful technique with wide applications in generative AI and transformative potential in optimization and decision making. We address a common learning scenario, where a dataset contains both unlabeled data and a smaller set with noisy reward labels. Our innovative approach employs a learned reward function on the smaller dataset as a pseudolabeler, allowing us to effectively generate samples conditioned on desired rewards while uncovering latent data representations. Theoretical insights highlight the model's ability to sample from the reward-conditioned data distribution and steer generated populations toward user-specified target rewards, aligning optimality gaps with off-policy bandit regret. We emphasize the interplay between reward signal strength, distribution shift, and off-support extrapolation costs in achieving near-optimal generative samples and provide empirical results including image generation, reinforcement learning, and control.