PhD Defense: Effective Training and Efficient Inference of Deep Neural Networks for Visual Understanding
Since the phenomenal success of deep neural networks (DNNs) on image classification, the research community have been developing wider and deeper networks with complex components for a variety of visual understanding tasks. While such ``heavy'' models achieve excellent performance, they pose two main challenges: (1) the training requires a significant amount of computational resource as well as large-scale labeled datasets acquired from time-consuming and labor-intensive human annotation process; and (2) the inference can be slow even with expensive graphics cards due to the high model complexity. To address these challenges, we explore improving the effectiveness of training DNNs so that better performance is achieved under the same computation and/or annotation cost during training, and improving the efficiency of inference that reduces the computational cost of DNNs while maintaining high accuracy.In this dissertation, we first propose several approaches including devising noise-aware supervisory signals, developing better semi-supervised learning methods and analyzing different pre-training techniques for training object recognition and detection models more effectively. In the second part, we present two adaptive computation frameworks that improve the inference efficiency of 3D convolutional networks and attention-based Vision Transformers for the tasks of image and video classification.
Dr. Larry S. Davis Dr. Abhinav Shrivastava Dr. Joseph F. JaJa Dr. Matthias Zwicker Dr. David Jacobs