PhD Proposal: Recognizing Object-Centric Attributes and Relations

Khoi Pham
05.04.2023 09:30 to 11:30

Recognizing an object's attributes such as color and shape, and its relations to other objects in an environment, is an innate human ability that allows us to effortlessly interact with the world. Even when faced with unfamiliar objects or objects whose appearances evolve over time, humans still excel at identifying them based on their attributes and relations. The goal of our research is to equip computer vision systems with this human-like recognition ability, allowing them to understand attributes and relations of objects to become more robust at handling real-world scene complexities. In this thesis, we present contributions in object-centric attributes and relations recognition, which are structured in two parts.The first part focuses on recognizing attributes for objects, a domain where current research is often constrained by domain-specific attributes, small-scale and noisy data. We introduce a large-scale attribute dataset that is diverse but challenging due to label sparsity and data imbalance. To mitigate these, we propose techniques to handle class imbalance, apply attention mechanism, and utilize contrastive learning to align attribute-sharing objects. However, as such large-scale dataset is expensive to collect, we continue to develop a framework capable of learning attribute prediction from image-text data. The proposed framework can scale up to predict a larger space of attribute concepts, including novel attributes from arbitrary text.The second part explores relations between objects, and examines how the interplay between attributes and relations can help improve image-text alignment. While previous research has relied on cross-attention, we demonstrate that scene graphs can enable the design of a dual encoder framework that is more efficient while being as powerful as cross-attention for image-text alignment. Our approach leverages graph neural networks to develop scene graph embedding that is rich in both attribute and relation semantics. Lastly, we discuss our ongoing work on visual relation detection from an object-centric perspective, and open-vocabulary object attribute detection.

Examining Committee


Dr. Abhinav Shrivastava

Department Representative:

Dr. Ramani Duraiswami


Dr. Larry Davis

Dr. Zhe Lin (Adobe Research)