PhD Proposal: Recognizing Object-Centric Attributes and Relations

Talk

Khoi Pham

Time:

05.04.2023 09:30 to 11:30

Location:

IRB 4105 or https://umd.zoom.us/j/2313749016?pwd=UjZMZFEwejM0d28wamVaWU5aalphdz09

URL:

https://talks.cs.umd.edu/talks/3521

Recognizing an object's attributes such as color and shape, and its relations to other objects in an environment, is an innate human ability that allows us to effortlessly interact with the world. Even when faced with unfamiliar objects or objects whose appearances evolve over time, humans still excel at identifying them based on their attributes and relations. The goal of our research is to equip computer vision systems with this human-like recognition ability, allowing them to understand attributes and relations of objects to become more robust at handling real-world scene complexities. In this thesis, we present contributions in object-centric attributes and relations recognition, which are structured in two parts.The first part focuses on recognizing attributes for objects, a domain where current research is often constrained by domain-specific attributes, small-scale and noisy data. We introduce a large-scale attribute dataset that is diverse but challenging due to label sparsity and data imbalance. To mitigate these, we propose techniques to handle class imbalance, apply attention mechanism, and utilize contrastive learning to align attribute-sharing objects. However, as such large-scale dataset is expensive to collect, we continue to develop a framework capable of learning attribute prediction from image-text data. The proposed framework can scale up to predict a larger space of attribute concepts, including novel attributes from arbitrary text.The second part explores relations between objects, and examines how the interplay between attributes and relations can help improve image-text alignment. While previous research has relied on cross-attention, we demonstrate that scene graphs can enable the design of a dual encoder framework that is more efficient while being as powerful as cross-attention for image-text alignment. Our approach leverages graph neural networks to develop scene graph embedding that is rich in both attribute and relation semantics. Lastly, we discuss our ongoing work on visual relation detection from an object-centric perspective, and open-vocabulary object attribute detection.

Examining Committee

Chair:

Dr. Abhinav Shrivastava

Department Representative:

Dr. Ramani Duraiswami

Members:

Dr. Larry Davis

Dr. Zhe Lin (Adobe Research)

Upcoming Events

Event

04.19.2024 12:00 to 13:30

IRB-0318

Computer Science APT Meeting

Talk

04.25.2024 13:00 to 14:00

IRB 4105 or https://umd.zoom.us/j/95853135696?pwd=VVEwMVpxeElXeEw0ckVlSWNOMVhXdz09

Human-centered Explainable AI: Expanding Explainable & Responsible AI
Upol Ehsan

Event

04.26.2024 12:00 to 13:30

IRB-4105

Computer Science APT Meeting

Event

04.26.2024 13:00 to 14:00

IRB-5105

Computer Science Instructional Faculty Meeting

Event

04.26.2024 15:00 to 16:30

IRB-0318

Computer Science Education Committee Meeting

Event

05.03.2024 11:00 to 12:00

IRB-4105

Computer Science APT Meeting

Event

05.03.2024 12:00 to 13:30

IRB-4105

Computer Science FFL

Event

05.06.2024 12:00 to 13:00

IRB-2137

Computer Science Department Council Meeting

Event

05.17.2024 12:00 to 13:30

IRB-4105

Computer Science APT Meeting