Advancing Fully-Open Audio General Intelligence

Talk

Sreyan Ghosh

Time:

03.10.2026 12:30 to 13:30

Location:

IRB 2207

URL:

https://talks.cs.umd.edu/talks/4522

This talk traces the arc of our research in advancing audio intelligence, culminating in the development of Audio Flamingo 2, 3, Next and Music Flamingo. The talk will begin with our early contributions in representation learning and synthetic data generation that laid the foundation for robust audio-language models. Building on these pillars, we helped shape the future versions of Audio Flamingo, a family of fully open large-scale audio-language models capable of understanding speech, sounds, and music at unprecedented scale. I will conclude by highlighting how designing rigorous evaluation benchmarks (MMAU and MMAU-Pro) has been pivotal for progress in audio intelligence, and how expanding toward omni-modal intelligence can accelerate the shift from recognition to expert-level reasoning across audio and beyond.