PhD Proposal: Towards Immersive Visual Content with Machine Learning

Brandon Feng
01.26.2022 12:00 to 14:00

IRB 5105

Virtual and augmented reality technology is on the cusp of dramatically changing the way we see, learn, and engage with the rest of the world.This exciting future promises the ability to generate and distribute immersive content to the users. However, the process of transforming data captured using physical cameras into content suitable for immersive experiences still requires overcoming numerous challenges.In this proposal, I first discuss the problem of recovering depth information from videos captured using 360-degree cameras. Depth information is crucial in creating immersive visual experiences with real-world captured data, because it 1) enables 3D rendering based on the viewer’s position, and 2) allows scene editing effects such as relighting and object insertion. I present a novel method that unifies the representation of object depth and surface normal using double quaternions. I have validated my approach through experimental results that show that training with double-quaternion-based loss function improves the prediction accuracy of a neural network with 360-degree video frames as input.Next, I discuss the problem of efficiently representing 4D light fields. Light fields have a significant potential for immersive visual applications. An important challenge to their widespread adoption is the extreme cost to store and transmit such high-dimensional data. Building on past research efforts into compressing light field content, I present a novel approach to representing light fields with neural networks.Unlike prior methods that divide the light field content into patches before encoding each patch separately, my approach treats the light field data as a mapping function between pixel coordinates and color. I further demonstrate the feasibility of training a neural network to accurately learn such a mapping function, and show how embedding the light field pixel coordinates using the Gegenbauer polynomials is crucial for achieving high reconstruction quality. Finally, I show that such a functional representation accomplishes high-quality interpolation and super-resolution on light fields.I conclude my proposal by giving an overview of some potential ideas on further improving the efficiency of immersive content representation using neural networks.Examining Committee:

Chair:Department Representative:Members:

Dr. Amitabh Varshney Dr. Furong Huang Dr. Christopher Metzler