Freeing Hybrid Distributed AI Training Configuration

Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2021 · Haoran Wang ·

Deep neural network (DNN) has become the leading technology to realize Artificial Intelligence (AI). As DNN models become larger and more complex, so do datasets. Being able to efficiently train DNNs in parallel has become a crucial need. Data Parallelism (DP) is the widest-used solution today to accelerate DNN training but could be inefficient when processing DNNs with large-size parameters. Hybrid Parallelism (HP), which applies different parallel strategies on different parts of DNNs, is more efficient but requires advanced configurations. Not all AI researchers are experts in parallel computing, thus automating the configuration of HP strategies is very desirable for all AI frameworks. We propose a parallel semantics analysis method, which can analyze the trade-offs among different kinds of parallelisms and systematically choose the HP strategies with good training time performance. We demonstrated experimentally 260% speedup when applying our method compared to using a conventional DP approach. With our proposal, AI researchers would be able to focus more on AI algorithm research without being disturbed by parallel analysis and engineering concerns.

PDF Abstract