PhD Proposal: Towards Fast and Efficient Representation Learning
Convolutional Neural Networks (CNNs) have achieved enormous success in computer vision, speech recognition, natural language processing and many other fields. The success of CNNs lies at its ability to automatically learn complex feature representations. With the increasing amount of data and model complexities, there is a surge interest in fast model training and inference. How to accelerate the model training and reduce model complexities is critical for real-world machine learning applications. In this proposal, we will explore several directions for fast and efficient representation learning: a) how to utilize distributed computing resources with fast network connections. b) how to reduce the model complexity for fast inference. c) how to train discrete models on devices with constrained computing resources. d) how to automatically design efficient architectures for unknown tasks.
In the first part, we investigate how to scale out single node machine learning algorithms in the distributed environment with fast network connection. We built a distributed data-parallel machine learning framework utilizing Remote Direct Memory Access (RDMA) over Infiniband. It provides abstractions for fine-grained in-memory updates using one-sided RDMA, limiting data movement costs during incremental model updates. The framework allows machine learning developers to specify the data-flow and apply communication and representation optimizations.
In the second part, we study how to reduce the computation cost of CNNs. We present an acceleration and compression method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the prediction accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications.
In the third part, we propose two future work. First, quantization of neural networks have shown competitive accuracy with significant less computing resources compared to full-precision networks. However, all existing approaches train low-precision networks using high-power devices while keeping real-valued weights as a reference. We will explore optimization algorithms for training discrete neural networks on low-power devices. Second, designing a good network architecture is difficult and requires human expertise with time-consuming trial-and-error. We will explore how to automatically design CNN architectures for unknown tasks with reinforcement learning.
Co-Chairs: Dr. Hanan Samet, Dr. Tom Goldstein
Dept rep: Dr. David Mount
Member: Dr. David Jacobs