Future Plans

(A) An examination of the lexicon (through WordNet) revealed about 1,300 visual verbs. Many of them however belong to equivalence classes (for example there are about 50 visual verbs denoting (different kinds) of walking). By considering one visual verb for each class, we come up with about 50 basic visual verbs. We plan to develop representations (multiview models) for all of them and at the same time move towards the analysis of people interaction. See for example the video below and the silhouettes extracted from a "give" action sequence.

 

 

Silhouette Extraction by Background Subtraction

As before, we can extract keyframes, as shown in the following            figure.

Keyframe Extraction

An analysis of each frame will provide relationships among the different body parts as well as any objects involved. This can be seen in the following figure.

The resulting graph resembles a "cognitive map" and the underlying predicate denoting the action is really a subgraph.  Appropriate matching techniques can the give rise to algorithms producing predicates from pixels.

(B) We also plan to develop motoric representations of actions. These ideas are based on recent literature on mirror neurons suggesting that we may be recognizing action by imagining performing the same action, i.e. by utilizing motor knowledge of the same action. We are developing this approach using motion capture data from our  Keck laboratory.

The Grammars of Human Behavior

PIs : Yiannis Aloimonos & Ken Nakayama      

A project funded by the National Science Foundation (HSD)