Communication-Efficient Heterogeneity-Aware Machine Learning System and Architecture
The key success of deep learning is the increasing size of models that can achieve high accuracy. At the same time, it is difficult to train the complex models with large data sets. Therefore, it is crucial to accelerate training with distributed systems and architectures, where communication and heterogeneity are two key challenges. In this talk, I will present two heterogeneity-aware decentralized training protocols without communication bottleneck. Specifically, Hop supports arbitrary iteration gap between workers by novel queue-based synchronization which can tolerate heterogeneity with system techniques. Prague uses randomized communication to tolerate heterogeneity with a new training algorithm based on partial reduce —— an efficient communication primitive. Moreover, I will present the systematic tensor partitioning for training on heterogeneous accelerator arrays (e.g., GPU/TPU). We believe that our principled approaches are crucial for achieving high-performance and efficient distributed training.