PhD Proposal: Improved Training of Deep Networks for Computer Vision

Talk

Abhay Yadav

Time:

04.27.2018 08:30 to 10:30

Location:

AVW 4424

URL:

https://talks.cs.umd.edu/talks/2047

Deep neural networks have become the state-of-the-art tool to solve many computer vision problems. However, these algorithms face a lot of computational and optimization challenges. For example, a) the training of deep networks is not only computationally intensive but also requires a lot of manual effort, b) for some particular use-cases, such as adversarial and binary deep networks, it’s even difficult to optimize to achieve good performance. In this proposal, we address these challenges by targeting the following closely related problems.First, we focus on the problem of automating the step-size and decay parameters in the training of deep networks. Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it difficult to use them for adaptive step-size selection. We propose alternative “big batch” SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation. The high fidelity gradients enable automated learning rate selection and do not require stepsize decay. Also, big batches can be parallelized across many machines, reducing training time and efficiently utilizing resources.Second, we propose a stable training method for adversarial deep networks. Adversarial neural networks solve many important problems in data science, but are notoriously difficult to train. These difficulties come from the fact that optimal weights for adversarial nets correspond to saddle points, and not minimizers, of the loss function. The alternating stochastic gradient methods typically used for such problems do not reliably converge to saddle points, and when convergence does happen it is often highly sensitive to learning rates. We propose a simple modification of stochastic gradient descent that stabilizes adversarial networks. We show, both in theory and practice, that the proposed method reliably converges to saddle points, and is stable with a wider range of training parameters than a non-prediction method. This makes adversarial networks less likely to “collapse”, and enables faster training with larger learning rates.Finally, as future work, we propose a new method to binarize both weights and activations in deep networks at run-time. Binarization of both weights and activations can lead to a drastic reduction in memory size, power consumption and inference time by replacing computationally intensive convolutions with bitwise operations. This makes them ideal to deploy on embedded devices, mobile phones, and wearable devices, etc. And there has been a lot of interesting work in this direction. However, most of these works either show a large performance degradation or have to increase the width of the network by a large margin. We believe that this degradation is due to inefficient optimization methods that require replacing the binarization function with its smooth approximation during the backward pass of gradient descent. Inspired by this, we propose a principled optimization formulation that takes into account the difference in the model during the forward and backward passes.

Examining Committee:

Chair: Dr. David Jacobs Dept. rep: Dr. Thomas Goldstein Members: Dr. Rama Chellappa

Upcoming Events

Event

04.26.2024 12:00 to 13:30

IRB-4105

Computer Science APT Meeting

Event

04.26.2024 13:00 to 14:00

IRB-5105

Computer Science Instructional Faculty Meeting

Talk

04.26.2024 13:30 to 15:00

ATL 3100A

PhD Proposal: Towards the Verification of Quantum Networks
Yusuf Alnawakhtha

Event

04.26.2024 15:00 to 16:30

IRB-0318

Computer Science Education Committee Meeting

Talk

04.29.2024 11:30 to 12:30

IRB 4107

PhD Proposal: Multi-Agent Autonomous Decision Making in Artificial Intelligence
Saptarashmi Bandyopadhyay

Talk

04.29.2024 15:00 to 16:00

IRB 5105

PhD Proposal: Scaling Policy Gradient Methods to Open-Ended Domains
Ryan Sullivan

Talk

04.30.2024 10:00 to 12:00

IRB 4105

AI Empowered Music Education
Snehesh Shrestha

Talk

04.30.2024 12:30 to 15:00

IRB 4107

Towards Trustworthy Models in Machine Learning
Xiaoyu Liu

Talk

05.01.2024 15:00 to 17:00

IRB IRB-4105

PhD Defense: Feedback for Vision
Michael Maynord

Talk

05.02.2024 12:30 to 14:00

IRB 4107

Towards AI Alignment: Advancing Fairness, Reliability, and Human-Like Perception in AI
Bang An