PhD Proposal: Exploiting High Level Knowledge for Incrementally Understanding Scenes and Videos

Talk
Varun Nagaraja
Time: 
05.13.2015 11:00 to 12:30
Location: 

AVW 4424

Structure in scenes and videos has been extensively used for improving the performance of tasks like object detection and event detection. While structure provides high level knowledge to filter out noise from the low level detections, it can also be used as a guide to incrementally understand scenes and videos. The advantage of incremental understanding is restricting the amount of computation time and/or resources spent for various detection tasks. In the first part of my work, I propose a technique for incrementally constructing Markov networks to perform event detection in basketball videos. In the second part, I propose a technique for incrementally searching regions in an image to detect objects of a query class.
To detect events in a structured scenario like a basketball game, the rules of the game can be applied to remove false positive events that are hypothesized by a low level event detector. Typically, the high level semantic analysis involves constructing a Markov network over the low level detections to encode relationships between them. In complex higher order networks (e.g. Markov Logic Networks), each low level detection can be part of many relationships and the network size grows rapidly as a function of the number of detections. I propose a feedback based incremental technique to keep the network size small. The network is initialized with detections above a high confidence threshold and then based on the high level semantics in the initial network, relevant detections are incrementally selected from the remaining ones that are below the threshold.
In situations where we are interested in identifying the location of an object of a particular class, a passive computer vision system would process all the regions in the image to finally output a small region. Instead, we can use the structure in the scene to search for objects without processing the entire image. I propose a search technique that sequentially processes image regions such that the regions that are more likely to correspond to the query class object are explored earlier. The problem is framed as a Markov decision process and an imitation learning algorithm is used to learn a search strategy. Since structure in the scene is very essential to perform an intelligent search, our technique is illustrated on indoor scene images as they contain both unary structure information (depth, height) and spatial context between objects in the scene.
Examining Committee:
Committee Chair: - Dr. Larry S. Davis
Dept's Representative - Dr. Thomas Goldstein
Committee Member(s): - Dr. David Jacobs