PhD Defense: Reliable deep learning: a robustness perspective

Talk
Sahil Singla
Time: 
12.14.2022 10:30 to 12:30
Location: 

IRB 4109

Deep learning models achieve impressive performance on many benchmark tasks often surpassing human-level performance. However, the performance drops considerably when there is a mismatch between training and test distributions. A key reason for this mismatch can be due to the presence of spurious features in the training set i.e. features that happen to co-occur with the main object but are not essential for predicting the same. A related limitation of these models is their vulnerability to adversarial perturbations: input perturbations imperceptible to a human that can arbitrarily change the prediction of the model. These limitations can potentially create life-threatening situations when these models are deployed in safety-critical applications.In this dissertation, we develop several algorithms for addressing these challenges.We first propose a framework for defending against unforeseen adversarial attacks by approximating the set of all imperceptible adversarial examples using deep neural networks. Next, we study the effect of loss curvature on saliency maps, robust overfitting and provable adversarial robustness. Following this, we introduce several building blocks for provably 1-Lipschitz neural networks namely an orthogonal convolution layer, activation functions with orthogonal jacobian, learnable lipschitz pooling layers and procedures for certifying adversarial robustness. Our newly introduced methods lead to significant improvements in both the clean and certified robust accuracy across several lipschitz networks with varying depth. Finally, we introduce several algorithms for identifying and mitigating failure modes of deep networks. Using the neurons of adversarially robust models as visual attribute detectors, we identify clusters of data with high error rate when certain visual attributes are either absent or present. Our analysis reveals several ImageNet classes that are highly susceptible to spurious features. To mitigate the discovered failure modes, we introduce a framework for improving model performance on the failure mode distribution by discovering images from the web that are perceptually similar to the failure mode images and adding the newly discovered images to the training set.

Examining Committee

Chair:

Dr. Soheil Feizi

Dean's Representative:

Dr. Behtash Babadi

Members:

Dr. Tom Goldstein
Dr. Ming Lin

Dr. Rene Vidal (JHU)

Dr. Eric Horvitz (Microsoft)