Eadom Dessalene

PhD Candidate
Perception and Robotics Group
Department of Computer Science
University of Maryland

The field of computer vision traditionally treats the learning of actions from videos as a direct translation of the understanding of objects from static images. However, humans perceive and predict the behavior of others in an embodied process. My research aims to bridge this gap by developing algorithms that allow robots to learn from videos on the Internet and improve the visual representations learned from the passive observation of Internet videos through active physical interaction with the real world.

I am a PhD candidate at the University of Maryland, College Park. I work with the Perception and Robotics Group and am advised by Yiannis Aloimonos.

Selected Publications

EmbodiSwap for Zero-Shot Robot Imitation Learning

Eadom Dessalene, Pavan Mantripragada, Michael Maynord, Yiannis Aloimonos

We introduce EmbodiSwap - a method for producing photorealistic synthetic robot overlays over human video. We employ EmbodiSwap for zero-shot imitation learning, bridging the embodiment gap between in-the-wild ego-centric human video and a target robot embodiment. We train a closed-loop robot manipulation policy over the data produced by EmbodiSwap.

arXiv

Context in Human Action through Motion Complementarity

Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos

Motivated by Goldman's Theory of Human Action-a framework in which action decomposes into 1) base physical movements, and 2) the context in which they occur-we propose a novel learning formulation for motion and context, where context is derived as the complement to motion. More specifically, we model physical movement through the adoption of Therbligs, a set of elemental physical motions centered around object manipulation. Context is modeled through the use of a contrastive mutual information loss that formulates context information as the action information not contained within movement information.

WACV 2024

Leap: Llm-generation of egocentric action programs

Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos

We introduce LEAP (illustrated in Figure 1), a novel method for generating video-grounded action programs through use of a Large Language Model (LLM). These action programs represent the motoric, perceptual, and structural aspects of action, and consist of sub-actions, pre- and post-conditions, and control flows. LEAP's action programs are centered on egocentric video and employ recent developments in LLMs both as a source for program knowledge and as an aggregator and assessor of multimodal video information.

ArXiv

Therbligs in Action: Video Understanding through Motion Primitives

Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos

In this paper we introduce a rule-based, compositional, and hierarchical modeling of action using Therbligs as our atoms. Introducing these atoms provides us with a consistent, expressive, contact-centered representation of action. Over the atoms we introduce a differentiable method of rule-based reasoning to regularize for logical consistency. Our approach is complementary to other approaches in that the Therblig-based representations produced by our architecture augment rather than replace existing architectures' representations. We release the first Therblig-centered annotations over two popular video datasets-EPIC Kitchens 100 and 50-Salads.

CVPR 2023

Mid-Vision Feedback

Michael Maynord, Eadom T Dessalene, Cornelia Fermuller, Yiannis Aloimonos

Feedback plays a prominent role in biological vision, where perception is modulated based on agents' evolving expectations and world model. We introduce a novel mechanism which modulates perception based on high level categorical expectations: Mid-Vision Feedback (MVF). MVF associates high level contexts with linear transformations. When a context is "expected" its associated linear transformation is applied over feature vectors in a mid level of a network. The result is that mid-level network representations are biased towards conformance with high level expectations, improving overall accuracy and contextual consistency.

Video ICLR 2023

Forecasting action through contact representations from first person video

Eadom Dessalene, Chinmaya Devaraj, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos

Human actions involving hand manipulations are structured according to the making and breaking of hand-object contact, and human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and models centered on contact, which we then use in action prediction and anticipation.

PAMI 2021

Using geometric features to represent near-contact behavior in robotic grasping

Eadom Dessalene, Yi Herng Ong, John Morrow, Ravi Balasubramanian, Cindy Grimm

In this paper we define two feature representations for grasping. These representations capture hand-object geometric relationships at the near-contact stage - before the fingers close around the object. Their benefits are: 1) They are stable under noise in both joint and pose variation. 2) They are largely hand and object agnostic, enabling direct comparison across different hand morphologies. 3) Their format makes them suitable for direct application of machine learning techniques developed for images.

Video ICRA 2019

Invited Talks

[Jan 2026] - Embodied Action Understanding - NASA JPL Vision Seminar, NASA Jet Propulsion Laboratory

[Dec 2025] - Generative Models of Action - NVIDIA Research Radar, NVIDIA

[Nov 2024] - Understanding Actions from Video - NYC Computer Vision Day (NYC CV Day)

[Sep 2024] - Learning the Organization of Action - University of Maryland, Baltimore County

[Jun 2024] - Learning the Organization of Action - Telluride Neuromorphic Workshop

[May 2023] - Understanding Actions from Video - CoRL Cognitive Science Workshop

Reviewer

ICRA, ICLR, CVPR, PAMI, WACV

Competitions

EPIC-Kitchens Action Recognition Challenge 2024 – Team Lead, 4th Place

Alexa Prize SimBot Challenge – Team Lead, University of Maryland (Qualified for Semi-Finals)

EPIC-Kitchens Action Anticipation Challenge 2020 – Team Lead, 1st Place

Amazon Robotics Challenge – Team Lead, Qualified for Finals

Eadom Dessalene

PhD Candidate Perception and Robotics Group Department of Computer Science University of Maryland

Selected Publications

Invited Talks

Reviewer

Competitions

PhD Candidate
Perception and Robotics Group
Department of Computer Science
University of Maryland