|
The Problem |
|
“Movement and sensation together become the antecedent of meaning... It is movement that makes possible all perceptual categorization.” Oliver Sacks
We hereby propose a novel, interdisciplinary approach to the study of human behavior, specifically focused on the experimental study and computational modeling of the internal representations and associated processes that underlie action perception and understanding by observers, and action planning and execution by actors. To facilitate both careful experimentation and formal theory, we approach the behavior representation problem primarily through the visual system, asking, How do we understand the actions of others using our vision? That is, how do we perform mappings from image sequences depicting simple actions to the corresponding internal representations that allow, e.g., action recognition, imitation, etc? Further, we explore higher-level cognitive representations and mechanisms used to categorize, reason about, and judge the movements and actions of others. The Broader Impact of our proposed interdisciplinary research includes a number of significant advancements in both research and applications in Psychology (e.g., robust social judgments given degraded biological motion), Kinesiology (e.g., analysis/modeling/training of movement profiles, as in athletics or pathology/rehabilitation), Robotics (e.g., control of anthropomorphic robots), Human and Computer Vision (e.g., automated action recognition in digital video), and in other fields concerned with the interpretation and production of human/humanoid action. The Intellectual Merit of our proposal derives from its principled development and empirical evaluation/refinement of a novel formal theory of the mental representations and processes subserving action understanding and planning; our work provides a compact but powerful and extensible computational approach to the analysis and synthesis of complex actions (and action sequences) based on a very small set of atomic postural elements ("keyframes" or "anchors") and the corresponding probabilistic, grammatical rules for their combination. Thus, in a sense, our probabilistic "pose grammar" approach to action representation is similar to state of the art techniques used for speech recognition (e.g., hidden Markov models), but with key postural silhouettes taking the place of phonemes; such augmented transition grammars also nicely reflect sophisticated new control-theoretic techniques in Robotics for robust anthropomorphic movement.
The action representational system is not monolithic, but rather occupies a spectrum of informational structures at hierarchical levels corresponding to different behavior "spaces": (a) the mechatronic space used in movement planning and production; (b) the cognitive space, involving representations for action recognition, analysis, and evaluation; (c) the visual motion space, which encodes and organizes visual motion caused by human action; and (d) the linguistic motion space, comprised of conceptual/symbolic action encoding. Excluding here the latter space, our theoretic, computational, and experimental efforts seek to clarify and formally describe both the nature of the representations in these spaces and, crucially, the mapping of representations across spaces. Notably, we explore a candidate action representation, referred to as a visuomotor representation, which, in facilitating the understanding of observed actions, may recapitulate and resonate with the actual motor representations used to generate movement. Moreover, we present a promising approach for obtaining this representation from discrete action elements or anchors. This endeavor spans a number of research domains, both basic and applied, including Human and Computer Vision, Cognitive and Social Psychology, Kinesiology and Motor Control, Artificial Intelligence and Robotics, and Computer Science and Animation. |
|
The Grammars of Human Behavior |
|
PIs : Yiannis Aloimonos & Ken Nakayama |
A project funded by the National Science Foundation (HSD) |