PhD Proposal: Scalable Adversarial Machine Learning

Ali Shafahi
05.02.2019 13:00 to 15:00

IRB 4107

In this study, we focus on vulnerabilities of deep neural networks in the image classification domain where the attacker’s goal is to cause misclassification on a given target image at test time. In this context, we first study a new class of poisoning attack that can cause misclassification. We call this attack a “clean-labeled” poisoning attack since the poison images are given the correct label. We show how these poisoning attacks can work in transfer learning and end-to-end training scenarios. Then we shift our focus to a more common class of attacks, adversarial examples, which assume that the attacker can manipulate the image at test time. We study both per-instance attacks and universal attacks. We study how cheap regularization methods can increase robustness of neural networks against per-instance adversarial examples. In particular, we show that we can gain robustness by aggressively enforcing logit-squeezing and label-smoothing. We argue that both these methods work by enforcing logits to be similar and we can achieve robustness at little cost by alternatively enforcing logit-similarity. With logit-similarity, regularized models can be as robust as adversarially trained models on the CIFAR-10 and CIFAR-100 datasets under strong attacks with many iterations.For universal attacks, we propose a simple optimization-based attack that reduces the top-1 accuracy of various network architectures on ImageNet to less than 20%, while learning the universal perturbation 13x faster than the standard method. To defend against these perturbations, we propose universal adversarial training, which models the problem of robust classifier generation as a two-player min-max game. We solve the min-max problem by alternating stochastic gradient. The training time for the robust model is 2x natural training. We also propose a simultaneous stochastic gradient method that is almost free of extra computation which allows us to do universal adversarial training on ImageNet. Both algorithms train models that are robust against universal perturbations.Examining Committee:

Chair: Dr. Tom Goldstein Dept rep: Dr. John Dickerson Members: Dr. Tudor Dumitras