UMD Team Advances AI Audio Systems with New Training Data and Benchmarks
Artificial intelligence (AI) systems that understand and produce text and images have grown exponentially into mainstream use, flooding the internet and social media feeds with AI-generated content.
But the same can’t be said for audio-based AI systems, known as Large Audio-Language Models (LALMs), which use machine learning, speech recognition, and natural language processing to convert spoken words into data, or to synthesize realistic human-like voices and music from text or other audio.
This deficiency stems in part from researchers and software developers involved with LALMs having little access to largescale audio-based training datasets or measurable benchmarks for assessing their work.
University of Maryland researchers, collaborating with scientists at tech giant NIVIDIA, are working to resolve this discrepancy, developing open-source AI platforms that offer a rich array of training data and a set of comprehensive benchmarks that will advance the efficiency and reliability of LALMs.
The researchers are presenting their findings as a spotlight paper at the Conference on Neural Information Processing Systems (NeurIPS), taking place in December in both San Diego and Mexico City.
In their paper, the researchers introduce Audio Flamingo 3, a fully open state-of-the-art LALM that advances reasoning and understanding across speech, sound, and music using a unified audio encoder that is trained using a novel strategy for joint representation learning.
The research team also developed several training datasets that were curated using novel strategies, with state-of-the-art results achieved on more than 20-plus benchmarks used to measure audio understanding and reasoning. These results surpassed both open-weight and closed source models that were trained on much larger datasets.
The value of these benchmarks, and the open-source training data, is that they will spur further research in this domain, says Ramani Duraiswami, a University of Maryland Professor of Computer Science who was a co-author on the paper.
Click HERE to read the full article
The Department welcomes comments, suggestions and corrections. Send email to editor [-at-] cs [dot] umd [dot] edu.
