1 code implementation • 2 Mar 2024 • Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Chuan Wu
The immense sizes of LLMs have led to very high resource demand and cost for running the models.
1 code implementation • 23 Feb 2024 • Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu
Training LLMs at this scale brings unprecedented challenges to training efficiency and stability.
1 code implementation • 16 Nov 2023 • Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu
Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.
no code implementations • 5 May 2022 • Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo
Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.
no code implementations • 16 Dec 2021 • Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo
Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20. 68x on average.
1 code implementation • 13 Sep 2019 • Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, Chen Meng, Wei. Lin
DL2 is a DL-driven scheduler for DL clusters, targeting global training job expedition by dynamically resizing resources allocated to jobs.
1 code implementation • EuroSys 2018 • Yanghua Peng
A deep learning training job is resource-intensive and time-consuming.
no code implementations • 3 Jan 2018 • Yixin Bao, Yanghua Peng, Chuan Wu, Zongpeng Li
In a shared cluster handling multiple training jobs, a fundamental issue is how to efficiently schedule jobs and set the number of concurrent workers to run for each job, such that server resources are maximally utilized and model training can be completed in time.
Distributed, Parallel, and Cluster Computing