Profile

Eadom Dessalene

PhD Candidate
Perception and Robotics Group
Department of Computer Science
University of Maryland

The field of computer vision traditionally treats the learning of actions from videos as a direct translation of the understanding of objects from static images. However, humans perceive and predict the behavior of others in an embodied process. My research aims to bridge this gap by developing algorithms that allow robots to learn from videos on the Internet and improve the visual representations learned from the passive observation of Internet videos through active physical interaction with the real world.

I am a PhD candidate at the University of Maryland, College Park. I work with the Perception and Robotics Group and am advised by Yiannis Aloimonos.

Selected Publications

EmbodiSwap for Zero-Shot Robot Imitation Learning
Eadom Dessalene, Pavan Mantripragada, Michael Maynord, Yiannis Aloimonos

We introduce EmbodiSwap - a method for producing photorealistic synthetic robot overlays over human video. We employ EmbodiSwap for zero-shot imitation learning, bridging the embodiment gap between in-the-wild ego-centric human video and a target robot embodiment. We train a closed-loop robot manipulation policy over the data produced by EmbodiSwap.

Context in Human Action through Motion Complementarity
Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos

Motivated by Goldman's Theory of Human Action-a framework in which action decomposes into 1) base physical movements, and 2) the context in which they occur-we propose a novel learning formulation for motion and context, where context is derived as the complement to motion. More specifically, we model physical movement through the adoption of Therbligs, a set of elemental physical motions centered around object manipulation. Context is modeled through the use of a contrastive mutual information loss that formulates context information as the action information not contained within movement information.

Leap: Llm-generation of egocentric action programs
Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos

We introduce LEAP (illustrated in Figure 1), a novel method for generating video-grounded action programs through use of a Large Language Model (LLM). These action programs represent the motoric, perceptual, and structural aspects of action, and consist of sub-actions, pre- and post-conditions, and control flows. LEAP's action programs are centered on egocentric video and employ recent developments in LLMs both as a source for program knowledge and as an aggregator and assessor of multimodal video information.

Therbligs in Action: Video Understanding through Motion Primitives
Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos

In this paper we introduce a rule-based, compositional, and hierarchical modeling of action using Therbligs as our atoms. Introducing these atoms provides us with a consistent, expressive, contact-centered representation of action. Over the atoms we introduce a differentiable method of rule-based reasoning to regularize for logical consistency. Our approach is complementary to other approaches in that the Therblig-based representations produced by our architecture augment rather than replace existing architectures' representations. We release the first Therblig-centered annotations over two popular video datasets-EPIC Kitchens 100 and 50-Salads.

Mid-Vision Feedback
Michael Maynord, Eadom T Dessalene, Cornelia Fermuller, Yiannis Aloimonos

Feedback plays a prominent role in biological vision, where perception is modulated based on agents' evolving expectations and world model. We introduce a novel mechanism which modulates perception based on high level categorical expectations: Mid-Vision Feedback (MVF). MVF associates high level contexts with linear transformations. When a context is "expected" its associated linear transformation is applied over feature vectors in a mid level of a network. The result is that mid-level network representations are biased towards conformance with high level expectations, improving overall accuracy and contextual consistency.

Forecasting action through contact representations from first person video
Eadom Dessalene, Chinmaya Devaraj, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos

Human actions involving hand manipulations are structured according to the making and breaking of hand-object contact, and human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and models centered on contact, which we then use in action prediction and anticipation.

Using geometric features to represent near-contact behavior in robotic grasping
Eadom Dessalene, Yi Herng Ong, John Morrow, Ravi Balasubramanian, Cindy Grimm

In this paper we define two feature representations for grasping. These representations capture hand-object geometric relationships at the near-contact stage - before the fingers close around the object. Their benefits are: 1) They are stable under noise in both joint and pose variation. 2) They are largely hand and object agnostic, enabling direct comparison across different hand morphologies. 3) Their format makes them suitable for direct application of machine learning techniques developed for images.

Invited Talks

[Jan 2026] - Embodied Action Understanding - NASA JPL Vision Seminar, NASA Jet Propulsion Laboratory

[Dec 2025] - Generative Models of Action - NVIDIA Research Radar, NVIDIA

[Nov 2024] - Understanding Actions from Video - NYC Computer Vision Day (NYC CV Day)

[Sep 2024] - Learning the Organization of Action - University of Maryland, Baltimore County

[Jun 2024] - Learning the Organization of Action - Telluride Neuromorphic Workshop

[May 2023] - Understanding Actions from Video - CoRL Cognitive Science Workshop

Reviewer

ICRA, ICLR, CVPR, PAMI, WACV

Competitions

EPIC-Kitchens Action Recognition Challenge 2024 – Team Lead, 4th Place

Alexa Prize SimBot Challenge – Team Lead, University of Maryland (Qualified for Semi-Finals)

EPIC-Kitchens Action Anticipation Challenge 2020 – Team Lead, 1st Place

Amazon Robotics Challenge – Team Lead, Qualified for Finals