Towards AI Alignment: Advancing Fairness, Reliability, and Human-Like Perception in AI

Talk
Bang An
Time: 
05.02.2024 12:30 to 14:00
Location: 

IRB 4107

Abstract:

As artificial intelligence (AI) increasingly shapes various aspects of society, ensuring its trustworthiness and alignment with human values has become crucial. This study advances AI alignment by enhancing fairness, reliability, and human-like perception in AI.Firstly, we tackle the challenge of maintaining fairness in an ever-changing world. Recognizing that the common assumption of identical training and test data distributions is often unrealistic, we introduce a technique to keep models unbiased even under distribution shifts. Secondly, we explore enhancing the understanding capabilities of Vision Language Models (VLMs) by mimicking human visual perception. Our training-free method improves both the accuracy and robustness of zero-shot visual classification. Lastly, we delve into the reliability of generative AI. We benchmark the robustness of image watermarks used for identifying AI-generated images. Our benchmark reveals several critical vulnerabilities of popular watermarks and guides the development of more secure ones. The ongoing works focus on the reliability of Large Language Models (LLMs). We observe the exaggerated safety of many LLMs and propose an approach to automatically identify falsely refused cases. Based on this technique, we discuss future works on fine-grained alignment.This study confronts key challenges in AI alignment, enriching the knowledge base to direct AI technology toward maximizing societal benefits and minimizing risks. It offers comprehensive approaches to enhancing AI systems, aligning them more closely with human values, and paving the way for future innovations in reliable AI.

Examining Committee

Chair:

Dr. Furong Huang

Department Representative:

Dr. Jia-Bin Huang

Members:

Dr. Hal Daumé