PhD Proposal: Scalable Methods for Robust Machine Learning
Adversarial attacks are small, often imperceptible, distortions to the inputs of machine learning systems which are crafted to substantially change the output of the system. These attacks represent a real security threat, and are especially concerning when machine learning systems are used in safety-critical applications. Therefore, certifiably robust classification techniques have been developed. In a certified classifier, for each input sample, in addition to a classification, the classifier also produces a certificate, which is a guaranteed lower bound on the magnitude of any perturbation required to change the classification. Existing techniques for certifiable robustness have significant limitations, which we address in this work:(i) Currently, randomized smoothing techniques are the only certification techniques that are viable for large-scale image classification (i.e. ImageNet). However, randomized smoothing techniques generally provide only high-probability, rather than exact, certificate results. Furthermore, larger amounts of computational power are needed to produce higher-probability (i.e., less likely to be incorrect) certificates, potentially leading to environmental and sustainability concerns. To address this, we develop deterministic randomized smoothing-based algorithms, which produce exact, rather than high-probability, certificates with finite computational costs. In particular, we present Deterministic Smoothing with Splitting Noise (DSSN), a method for certification in the L_1 metric which is the first deterministic method for this metric that scales to ImageNet, while significantly outperforming prior randomized methods.(ii) Certification results only apply to particular metrics of perturbation size. (For example, L_1 or L_2 metrics.) There is therefore a need to develop new techniques to provide provable robustness against different types of attacks. In this work, we develop randomized smoothing-based algorithms for several new types of adversarial perturbation, including Wasserstein adversarial attacks, patch adversarial attacks, and L_p adversarial attacks for p<1. The methods developed for patch and L_p p<1 attacks are also deterministic, allowing for efficient exact certification.(iii) Most work in certified robustness focuses on the inference-time classification setting, where the perturbation is applied to a sample in order to change the output of a classifier on that sample. We extend robust learning methods to new settings and applications of machine learning. In particular, we develop a certified defense against data poisoning attacks, where the attacker makes small changes to the data used to train a model, rather than the samples that are targeted for misclassification. Continued work will focus on extending robust machine learning methods to additional settings, such as reinforcement learning.
Dr. Soheil Feizi Dr. David Jacobs Dr. Thomas Goldstein