Learning to Understand Visual Data with Minimal Human Supervision
Humans and animals learn to see the world mostly on their own, without supervision, yet today’s state-of-the-art visual recognition systems rely on millions of manually-annotated training images. This reliance on labeled data has become one of the key bottlenecks in creating systems that can attain a human-level understanding of the vast concepts and complexities of our visual world. Indeed, while computer vision research has made tremendous progress, most success stories are limited to specific domains in which lots of carefully-labeled data can be unambiguously and easily acquired.In this talk, I will present my research in computer vision and deep learning on creating scalable recognition systems that can learn to understand visual data with minimal human supervision. Given the right constraints, I’ll show that one can design learning algorithms that discover and generate meaningful patterns from the data with little to no human supervision. In particular, I’ll focus on algorithms that can localize relevant image regions given only weak image/video-level supervision; hierarchically disentangle and generate fine-grained details of objects; and anonymize sensitive video regions for privacy-preserving visual recognition. I’ll conclude by discussing remaining challenges and future directions.