UMD Model Joins ‘Arms Race’ to Develop AI-Powered Tech that Understands Music

Nvidia-backed platform breaks down songs based on personal taste, emotion and mood.
Descriptive image for UMD Model Joins ‘Arms Race’ to Develop AI-Powered Tech that Understands Music

ABBA enthusiasts instantly recognize the 1976 hit “Money, Money, Money” when they hear its quick-tempo staccato piano intro, while pop music devotees with musical training might also note the 4/4 time signature in the key of A.

But would the Swedish band’s superfans register that the song isn’t just pop, but a disco-funk-pop hybrid brightened by frequent major-chord lifts? Or that its four-on-the-floor acoustic drumbeat complements a syncopated electric bass and full brass section, or that lead vocalist Anni-Frid Lyngstad is a mezzo-soprano whose slightly nasal timbre evokes sassiness? 

Such details are tracked by Music Flamingo, a new artificial intelligence (AI) model trained by University of Maryland computer scientists in collaboration with Nvidia to experience music as much like human listener as possible. The technology builds on years of work to make audio and speech understandable to AI models, and could ultimately enable a tool that makes song and playlist recommendations that aren’t just determined by listening habits, but also factor in the mood of the moment.

Popular streaming platforms like Spotify use algorithms that convert metadata into broad labels—Top 100 rap hits of the 1980s, for instance—but don’t break down a song’s emotions or musical structure, according to Sreyan Ghosh, a UMD doctoral student in computer science who describes the system in a technical paper posted this week.

“It’s based on clicks rather than deep understanding of the music, but multiple elements go into a song that makes us feel the way we feel,” he said.

An app based on Music Flamingo, in contrast, could start with both a listener’s baseline preferences pegged to listening history—like a fondness for gravelly vocals or waltz crescendos—then adjust its recommendations following ChatGPT-like prompts.

“You could tell the model, ‘’I’m feeling low. Can you cheer me up?’” said Ghosh.

Music Flamingo is designed so that it doesn’t just recognize a guitar; it recognizes a nylon-string guitar plucked in the flamenco style. Lyrically, the indie ballad “Jim and Pam,” in which the vocalist croons, “I fell in love with my best friend… I’ll leave you sleeping on my shoulder,” teaches the model about affectionate friendship and unwavering support, so it can respond to requests mentioning those things. 

Ghosh plans to release the open-source technology within months through Nvidia, who funded his Ph.D. through a graduate fellowship. The tool could ultimately be packaged into a music listening app by a third party, he said.

Click HERE to read the full article

The Department welcomes comments, suggestions and corrections.  Send email to editor [-at-] cs [dot] umd [dot] edu.