PhD Proposal: Robustness and Understandability of Deep Models

Amin Ghiasi
12.01.2021 12:00 to 14:00

IRB 4107

Recent studies have proposed that a range of "certified" classifiers can deflect adversarial attacks. In addition to the labeling of images, certified classifiers produce (when possible) a certificate guaranteeing that the input image is not an lp-bounded adversarial example. We present a new attack that exploits not only the labeling function but also the certificate generator. The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples. The proposed "Shadow Attack" causes certifiably robust networks to mislabel an image and simultaneously produce a "spoofed" certificate of robustness.Many articles have proposed feature visualization and model inversion methods to understand deep networks. However, existing techniques for model inversion typically rely on hard-to-tune regularizers, such as total-variation or feature regularization, which must be individually calibrated for each network to produce adequate images. In this work, we introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper-parameter tuning. Under our proposed augmentation-based scheme, one can use the same set of augmentation hyper-parameters for inverting a wide range of image classification models, regardless of input dimensions or the architecture. We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset. To the best of our knowledge, any previous works have not successfully accomplished the mentioned tasks.Feature visualization produces interpretable images that stimulate the response of each feature map individually. Feature visualization is helpful for networks trained for image-based tasks. Visualization methods help us understand and interpret what networks "see." In particular, they elucidate the layer-dependent semantic meaning of features, with shallow features representing edges and deep features representing objects. While this approach has been reasonably practical for vision models, our understanding of networks for processing auditory inputs, such as automatic speech recognition (ASR) models, is limited because their inputs are not visual. In this article, we seek to understand what ASR networks hear. To this end, we consider methods to sonify, rather than visualize, their feature maps. Sonification is the process of using sounds (as opposed to visuals) to apprehend complex structures. We can better understand what networks respond to by listening to audio and how this response varies with depth and feature index. We hear that shallow features respond strongly to simple sounds and phonemes, while deep layers react to complex words and word parts. We also observe that shallow layers preserve speaker-specific nuisance variables like pitch, while the deep layers eschew these attributes and focus on the semantic content of speech.Examining Committee:

Chair:Department Representative:Members:

Dr. Thomas Goldstein Dr. John Dickerson Dr. David W. Jacobs