PhD Proposal: Towards Generalizable Learning under Distribution Shifts

Talk
Manli Shu
Time: 
04.10.2023 14:30 to 16:30
Location: 

IRB 4237

How to make a machine learning model generalizable is a fundamental problem that has been studied along with the development of machine learning and deep learning. People have been studying the problem with different emerging neural networks, learning methods, and real-world applications. A simple example of generalization is how to make a learned model generalize to the testing data. In this case, we assume there is a subtle difference between the training and testing data distribution. Other generalization problems tackle more prominent data distribution shifts, for example, when a model trained on clean natural images is tested on images with domain shifts. In this proposal, we will look into ways to improve the generalization of machine learning models under various types of data distribution shifts.First, we start with a straightforward solution. Data augmentation can alleviate over-fitting and improve generalization. Most models nowadays are trained with data augmentations. In vision tasks, common data augmentations consist of a set of basic image transformations in the pixel space, including random crop, rotation, and color jittering, etc. In our first work, we propose an optimizable data augmentation pipeline, where we parameterize each augmentation operation with learnable parameters that control its strength instead of doing it randomly. To account for unseen test data distributions, we want our model to make robust predictions on any data augmentation. Inspired by the adversarial training literature, we formulate a min-max optimization problem to tune these parameters. For efficiency, we design the augmentation operations to be differentiable and optimized end-to-end along with the model training.In the second work, we consider a higher level of augmentation to improve generalization. Data augmentations are usually restricted to the family of image transformations, and the potency of data augmentation can be limited by the choice of those transformations. On the other hand, image feature representations have been shown to be connected to properties in the pixel space. Specifically, the feature normalization statistics (i.e., the mean and standard deviation) encode information about the ``style" of an image. Our second work proposes a feature space augmentation method by directly perturbing the feature normalization statistics. As our goal is to make the model generalizable to any domain shifts, we train the model on a ``worst-case" style by perturbing features adversarially.With the recent development of large foundation models, generalization seems to be less of a problem. For example, CLIP, a vision-language model trained on millions of text-image pairs from the internet, can do zero-shot classification on different datasets and shows competitive performance on common Out-of-Distribution (OOD) benchmarks. The zero-shot predictions are made with the help of the textual prompt, and designing such prompts thus plays a crucial role in applying foundation models to downstream tasks. However, in our third work, we identify several limitations of existing prompting techniques from the generalization perspective. We propose a new paradigm of prompting by tuning the prompt at test time. Our method does not require additional data or annotation, thus retaining a pre-trained model's zero-shot generalization property.

Examining Committee

Chair:

Dr. Tom Goldstein

Department Representative:

Dr. Furong Huang

Members:

Dr. Tianyi Zhou