Large-Scale Machine Learning Model Computing and Compression

Heng Huang
Talk Series: 
10.06.2023 11:00 to 12:00

Machine learning and artificial intelligence are gaining fresh momentum, and have helped us enhance not only many industrial and professional processes but also our everyday living. The recent success of machine learning heavily relies on the surge of big data, big models, and big computing. However, the inefficient algorithms often restrict the applications of machine learning to very large-scale tasks. In terms of big data, serious concerns, such as communication overhead and convergence speed, should be rigorously addressed when we train learning models using large amounts of data located at multiple computers or devices. In terms of the big model, it is still an underexplored research area if a model is too big to train on a single computer or device. To address these challenging problems, we focused on designing new ultra-scale machine learning algorithms, efficiently optimizing and training models for big data problems, and studying new discoveries in both theory and applications. I will present our recent research results on developing model parallelization algorithms to solve the big model problem in deep neural networks, and distributed learning methods to address the big data computing issues. To deploy the big models in real-world applications with limited computational budget, we also designed new model pruning approaches to compress the big machine learning models to much smaller size with maintaining good performance. I will introduce our newly developed model pruning techniques including interpretation enhanced model pruning and multimodal transformer pruning.