PhD Proposal: Extracting Scene Motion at High Speed

Anton Mitrokhin
05.09.2019 13:00 to 15:00
IRB 5107

With recent advances in the fields of robotics and computer vision, autonomous platforms are no longer restricted to research laboratories - they need to operate in an open world, where reliability and safety are key factors. Recently, there has been much progress in imaging sensor technology, offering alternative solutions to scene perception. Neuromorphic event-based sensors, inspired by the transient pathway of mammalian vision, can offer exciting alternatives for visual motion perception. Such sensors do not record image frames, but instead - the changes of lighting occurring independently at every camera pixel. Each of these changes is transmitted asynchronously and is called an event. By its design, this sensor accommodates a large dynamic range, provides high temporal resolution and low latency - ideal properties for applications where high quality motion estimation and tolerance towards challenging lighting conditions are desirable.In this thesis proposal, we develop a set of event-based vision methods for dense depth prediction, trajectory estimation, object tracking and motion segmentation. We leverage accurate timestamp information provided by event-based sensors and the continuity of the event stream to allow for a robust, low-latency scene motion extraction.In the first part of the proposal we show how event cloud warping can be used for the task of simultaneous egomotion estimation and motion segmentation. We represent a small slice of the cloud (typically 50 milliseconds wide in time) as a 3D pointcloud, where each point is represented by (x, y, t) - the pixel coordinate and the event timestamp. We then project this cloud on an image plane, using timestamps as pixel values, hence preserving most of the local 3D structure of the cloud. This time-image is then used as an error function for iterative event slice warping, until the gradient of the timestamps is minimized. The resulting warped cloud, as well as the corresponding time-image, is then used for egomotion estimation and motion segmentation.The proposal will also explore the possibility of applying 3D warping techniques to larger time slices, 10s of seconds long, which will allow to improve motion segmentation. In the second part of the proposal we develop a learning pipeline to estimate dense depth, flow and egomotion in a self-supervised manner by only using event information - a challenging task, since event-based information is sparse. Our learning performs egomotion and depth estimation at night well, and can transfer from day to night sequences. We believe that our success is due to a novel data representation, which preserves temporal information carried by the events.We then extend this pipeline by adding motion segmentation and object velocity predictions, using a multislice 3D-like input to further improve reliability against noise. We also create a state of the art event-based vision dataset - EV-IMO - which is the first one to provide motion segmentation masks for a non-simulated event stream recording, as well as egomotion and depth ground truth at rates of up to 200 frames per second.Examining Committee:

Chair: Dr. Yiannis Aloimonos Dept rep: Dr. Matthias Zwicker Members: Dr. Cornelia Fermuller