1 code implementation • 10 Jun 2022 • Zhiquan Lai, Shengwei Li, Xudong Tang, Keshi Ge, Weijie Liu, Yabo Duan, Linbo Qiao, Dongsheng Li
These features make it necessary to apply 3D parallelism, which integrates data parallelism, pipeline model parallelism and tensor model parallelism, to achieve high training efficiency.
no code implementations • 5 Oct 2021 • Keshi Ge, Yongquan Fu, Zhiquan Lai, Xiaoge Deng, Dongsheng Li
Distributed stochastic gradient descent (SGD) approach has been widely used in large-scale deep learning, and the gradient collective method is vital to ensure the training scalability of the distributed deep learning system.
no code implementations • 11 Sep 2018 • Lei Guan, Linbo Qiao, Dongsheng Li, Tao Sun, Keshi Ge, Xicheng Lu
Support vector machines (SVMs) with sparsity-inducing nonconvex penalties have received considerable attentions for the characteristics of automatic classification and variable selection.