Search Results for author: Xiaonan Nie

Found 6 papers, 4 papers with code

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

1 code implementation • 5 Jul 2023 • Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, Bin Cui

Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models.

Navigate

231

Paper
Code

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

no code implementations • 8 Apr 2023 • Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui

We first present an empirical analysis on the problems and opportunities of training MoE models, which motivates us to overcome the routing imbalance and fluctuation problems by a dynamic expert management and device placement mechanism.

Scheduling

Paper
Add Code

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

no code implementations • 6 Mar 2023 • Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui

Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models.

Management Scheduling

Paper
Add Code

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

2 code implementations • 25 Nov 2022 • Xupeng Miao, Yujie Wang, Youhe Jiang, Chunan Shi, Xiaonan Nie, Hailin Zhang, Bin Cui

Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models.

231

Paper
Code

EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

2 code implementations • 29 Dec 2021 • Xiaonan Nie, Xupeng Miao, Shijie Cao, Lingxiao Ma, Qibin Liu, Jilong Xue, Youshan Miao, Yi Liu, Zhi Yang, Bin Cui

Then it diversifies the experts and continues to train the MoE with a novel Dense-to-Sparse gate (DTS-Gate).

Language Modelling Machine Translation +1

Paper
Code

HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework

3 code implementations • 14 Dec 2021 • Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu Tao, Bin Cui

Embedding models have been an effective learning paradigm for high-dimensional data.

231

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.