Tackling Data Scarcity in Deep Learning

Talk
Anima Anandkumar
Caltech
Time: 
08.30.2018 13:00 to 14:00
Location: 

AVW 4172

Modern deep learning has relied on large labeled datasets for training. However, such datasets are not easily available in all domains, and are expensive/difficult to collect. We integrate data collection and aggregation with model training through active learning, partial feedback and crowdsourcing methods. We also develop sample efficient training algorithms through the use of synthetic data, generative models and semi-supervised learning. We develop tensor algebraic algorithms that efficiently encode multiple modalities and higher order dependencies. These techniques can drastically reduce data requirements in a variety of domains.