Speech and Audio Developments and Challenges in Industry
Sefik Emre Eskimez
Time:
04.16.2026 12:30 to 13:30
Location:
This talk examines how conversational voice agents are built in industry, tracing the evolution from cascaded pipelines (ASR —> LLM —> TTS) to thinker-talker and end-to-end architectures. We discuss key design decisions including audio representations, turn-taking, and full-duplex interaction, and address the practical challenges that separate research from production: training data at scale, evaluation beyond offline metrics, robustness under real-world conditions, echo cancellation, noise handling, and latency budgeting.