PhD Proposal: Deep Learning: Lightweight, Versatile Implementation and Applications in Cognitive Robots

Talk
Chengxi Ye
Time: 
05.27.2016 11:00 to 12:30
Location: 

AVW 4424

Deep neural networks have given rise to major advancements in many problems of machine intelligence. Most current implementations of neural network models put the primary emphasis on efficiency. Often they involve multiple programming languages and it requires extensive efforts to thoroughly understand the low-level engineering details and to modify the models. A straightforward and self-explanatory deep learning framework is highly anticipated to accelerate the understanding and applications of deep neural network models.
We present LightNet, a lightweight, versatile and purely Matlab-based high-level implementation of deep learning algorithms. The goal of its design is to provide an easy-to-understand, easy-to-use and efficient computational platform for deep learning research. The core modules in LightNet are implemented in hundreds of lines of code, orders of magnitude succinct than existing deep learning pipelines. LightNet supports major deep learning architectures such as Multilayer Perceptron Networks (MLP), Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). The framework also supports both CPU and GPU for computation, and the switch between them is straightforward. Different applications in computer vision, natural language processing and robotics are demonstrated as experiments.
LightNet will be extended to include latest neural network architectures and training methods. We will use LightNet to study fundamental properties of neural networks.
As the second part of this work, we use deep learning based techniques to solve a fundamental problem in robotics. For robots to perform actual tasks, it is critical for the robot to have a functional understanding of the visual scene. We address the problem of localizing and recognition of functional areas from an arbitrary indoor scene, formulated as a two-stage deep learning based detection pipeline. 2D real-world images serve as inputs to the pipeline. We demonstrate that our detection model can be generalized onto novel indoor scenes by cross validating it with the images from two different datasets. We propose to complete a pipeline that is similar to humans by using depth information to verify or to guide fine scale actions.
Examining Committee:
Chair: Dr. Yiannis Aloimonos
Dept. rep: Dr. Adam Porter
Member: Dr. Cornelia Fermuller