Efficient Acoustic Simulation for Learning-based Virtual and Real World Audio Processing

Talk
Zhenyu Tang
Time: 
04.01.2022 12:30 to 14:30
Location: 

IRB 5105

Sound propagation is commonly perceived as air pressure perturbations due to vibrating/moving objects. The energy of sound gets attenuated by transmitting in the air over a distance, and by being absorbed at other object surfaces. Numerous research have focused on devising better acoustic simulation methods to model sound propagation in a more realistic manner. The benefits of accurate acoustic simulations include but are not limited to: computer-aided acoustic design, acoustic optimization, synthetic speech data generation, and immersive audio-visual rendering for mixed reality. However, acoustic simulation has been underexplored for relevant virtual and real world audio processing applications. The main challenges in adopting accurate acoustic simulation methods include the tradeoff between accuracy and time-space cost, and the difficulties in acquiring and reconstructing acoustic scenes in the real world.In this dissertation, we propose novel methods to overcome above challenges by leveraging the inferential power of deep neural networks, and combining them with interactive acoustic simulation techniques. First, we develop a neural network model that can learn the acoustic scattering field of different objects given their 3D representation as the input. This works facilitates the inclusion of wave acoustic scattering effects in interactive sound rendering applications, which used to be difficult without intensive pre-computation. Second, we incorporate a deep acoustic analysis neural network into the sound rendering pipeline to allow the generation of sounds that are perceptually consistent with real-world sounds. This is achieved by predicting acoustic parameters at run-time from real-world audio samples and optimizing simulation parameters accordingly. Finally, we build a pipeline that utilizes general 3D indoor scene datasets to generate high-quality acoustic room impulse responses, and demonstrate the usefulness of the generated data on several practical speech processing tasks. Our results demonstrate that by leveraging state-of-the-art physics-based acoustic simulation and deep learning techniques, realistic simulated data can be generated to enhance the sound rendering quality in the virtual world and boost the performance of audio processing tasks in the real world.
Examining Committee:

Chair:Dean's Representative:Members:

Dr. Dinesh Manocha Dr. Carol Espy-Wilson Dr. Ming C. Lin Dr. Ramani Duraiswami Dr. Nirupam Roy