PhD Proposal: Context Driven Scene Understanding

Talk
Xi Chen
Time: 
12.16.2014 13:30 to 15:00
Location: 

AVW 3450

Understanding objects in complex scenes is a fundamental and challenging problem in computer vision. Given an image, we would like to answer the questions of whether there is an object, where is it, and if possible, locate it with a bounding box or pixel-wise labels. Typical object detection and recognition approaches usually consider only the appearance of the object in a single image. However, due to variation in data distribution, occlusion and viewpoint change, object models may not always capture the appearance of objects and ambiguity arises. In this proposal, we present context driven approaches leveraging relationships between objects in the scene to improve both the accuracy and efficiency of scene understanding.
In the first part, we describe an approach to jointly solve the segmentation and recognition problem using a multiple segmentation framework with context. Our approach formulates a cost function based on contextual information in conjunction with appearance matching. This relaxed cost function formulation is minimized using an efficient quadratic programming solver and an approximate solution is obtained by discretizing the relaxed solution. Experiments demonstrate that the approach improves labeling performance compared to other segmentation based recognition approaches.
With the recent surge in photos and videos taken from hand-held devices and those shared online, many of which are taken of the same scenes, the need to automatically label objects in such image sets has emerged. Traditional approaches to recognition typically consider only a single test image using appearance and contextual cues. However, modeling relationship between objects is difficult as they are also viewpoint dependent and do not generalize well. We introduce a new problem called object co-labeling where the goal is to jointly annotate multiple images of the same scene which do not have temporal consistency. We present an adaptive framework for joint segmentation and recognition to solve this problem. An objective function that considers not only appearance but also appearance and context consistency across images of the scene is proposed. Our approach improves labeling performance compared to labeling each image individually. We also show the application of our co-labeling framework to other recognition problems such as label propagation in videos and object recognition in similar scenes.
Usually, context-sensitive methods consider relations of the query object with all other object classes in the scene at the same time. This is highly inefficient since many non-informative contextual objects have to be queried. We propose a new strategy for simultaneous object detection and segmentation in the scene. Instead of evaluating classifiers for all possible locations of objects in the image, we develop a divide-and-conquer approach by sequentially posing questions for computer to answer given query and the image, like playing a "Twenty Questions" game.
Such questions are dynamically selected based on the query, the scene and current observed responses given by object detectors and classifiers. We present an efficient object search policy that considers the most informative questions for both the query and the scene. This policy is driven by semantic contextual model which sequentially refine the search area for the query. We formulate the policy in a probabilistic framework that integrates the current information and history observation to update the model and determine the next most informative action to take.
Experiments show promising results compared with baselines of exhaustive search, searching for objects in random sequences and random locations.
Examining Committee:
Committee Chair: - Dr. Larry S. Davis
Dept's Representative - Dr. James Reggia
Committee Member(s): - Dr. Chellappa