Learning Spatio-Temporal Representations for Video Understanding

Talk

Du Tran

Talk Series:

Visitors

Time:

02.22.2021 13:00 to 14:00

Location:

https://umd.zoom.us/j/94543765116?pwd=clY3MVV5Z1g4T2xpdnJMdjFiMFhYdz09

URL:

https://talks.cs.umd.edu/talks/2772

Video understanding is one of the fundamental problems in computer vision with various applications, including autonomous vehicles, robot learning, and visual perception. Compared with traditional image understanding, video understanding: (i) has higher model complexity and requires to learn from a much larger amount of data; (ii) requires more expensive annotations; (iii) and sometimes demands multimodal modeling, e.g., audiovisual modeling instead of visual only. In this talk, I will present some of our approaches addressing these challenges, such as efficient and scalable spatiotemporal learning, cross-modal self-supervised learning of video and audio representations, and multimodal learning. Finally, I will outline several potential future research directions in this area.