Robot Vision for Manipulation Actions
One of the challenging applications in autonomous systems design is to enable robots to collaborate with humans in daily activities. The large variations in the way humans perform manipulation actions and the occlusions in video caused by the movement and interactions of humans with different objects make the interpretation of dexterous actions very difficult for computational perception, and thus powerful generalization mechanisms are necessary. In this talk, I will first give an overview of a robot vision system architecture, which has perception in computational feedback loops with cognition and action. Then I will describe two novel vision modules at the core of this architecture to recognize objects and actions in cluttered environments. While classic approaches to object recognition relate symbolic information directly to visual input, I have studied so-called mid-level grouping processes, which can serve as the interface between image processing and cognition. They have been implemented as image operators to obtain objects in images and image depth data through attention, segmentation and recognition processes. While classic Computer Vision approaches treat action recognition as a classification of video segments, Robot Vision has real-time constraints and needs to continuously update not only what is happening now, but also predict what will happen in the future. I will describe an approach for interpreting dexterous actions, whose novelty lies in including forces as a component in the recognition. By learning from data in both perceptual and motoric spaces, a richer representation is acquired that can be used for better action prediction, and it also provides a new tool for robot learning.