MS Defense: Deep Neural Networks for End-to-End Optimized Speech Coding

Talk

Srihari Kankanahalli

Time:

08.09.2017 13:00 to 14:30

Location:

AVW 3450

URL:

https://talks.cs.umd.edu/talks/1828

Modern compression algorithms are the result of years of research; industry standards such as MP3, JPEG, and G.722.1 required complex hand-engineered compression pipelines, often with much manual tuning involved on the part of the engineers who created them. Recently, deep neural networks have shown a sophisticated ability to learn directly from data, achieving incredible success over traditional hand-engineered features in many areas. Our aim is to extend these "deep learning" methods into the domain of compression.
We present a novel deep neural network model and train it to optimize all the steps of a wideband speech-coding pipeline (compression, quantization, entropy coding, and decompression) end-to-end directly from raw speech data, no manual feature engineering necessary. In testing, our learned speech coder performs on par with or better than current standards at a variety of bitrates (~9kbps up to ~24kbps). It also runs in realtime on an Intel i7-4790K CPU.
Examining Committee:
Chair: Dr. David Jacobs
Members: Dr. Ramani Duraiswami
Dr. Carol Espy-Wilson
Dr. Shihab Shamma

Upcoming Events

Event

09.05.2025 12:00 to 13:30

IRB-4105

Computer Science APT Meeting