Compositional and Robust Action Understanding

Talk

Huijuan Xu

Talk Series:

Visitors

Time:

03.30.2021 13:00 to 14:00

Location:

https://umd.zoom.us/j/94543765116?pwd=clY3MVV5Z1g4T2xpdnJMdjFiMFhYdz09

URL:

https://talks.cs.umd.edu/talks/2804

In an era with massive video data becoming available from a wide range of applications (e.g. smart home devices, medical instruments, intelligent transportation networks, etc), designing algorithms which understand action can enable machines to interact meaningfully with human partners. Practically, continuous video streams require temporal localization of actions before a trimmed action recognition method can be applied, yet such annotation is expensive and suffers from annotation consistency issues. Also, early video understanding technologies mostly use holistic frame modeling and do not employ reasoning capabilities. In this talk, I will discuss how to detect action in continuous video streams efficiently. Specifically, I will talk about several temporal action detection models with different levels of supervision. Next, I will introduce how to understand action compositionally with localized foreground subjects or objects to reduce the effect of confounding variables, and bridge a connection with common knowledge of involved objects. Additionally, natural language provides an efficient and intuitive way to convey details of action to a human. I will conclude the talk with some perspectives on how compositional and efficient modeling opens the door for real-word action understanding with high complexity and fine granularity.

Compositional and Robust Action Understanding

Talk

Talk

Talk

Talk

Event

Event

Event

Event

Event

Event