Search Results for author: Yanghua Peng

Found 8 papers, 5 papers with code

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

1 code implementation • 2 Mar 2024 • Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Chuan Wu

The immense sizes of LLMs have led to very high resource demand and cost for running the models.

Paper
Code

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

1 code implementation • 23 Feb 2024 • Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu

Training LLMs at this scale brings unprecedented challenges to training efficiency and stability.

Language Modelling Large Language Model

342

Paper
Code

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

1 code implementation • 16 Nov 2023 • Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu

Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.

Domain Adaptation

Paper
Code

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

no code implementations • 5 May 2022 • Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo

Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.

Paper
Add Code

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

no code implementations • 16 Dec 2021 • Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo

Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20. 68x on average.

Graph Property Prediction Node Classification +1

Paper
Add Code

DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters

1 code implementation • 13 Sep 2019 • Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, Chen Meng, Wei. Lin

DL2 is a DL-driven scheduler for DL clusters, targeting global training job expedition by dynamically resizing resources allocated to jobs.

Fairness reinforcement-learning +2

Paper
Code

Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters

1 code implementation • EuroSys 2018 • Yanghua Peng

A deep learning training job is resource-intensive and time-consuming.

Machine Translation Scheduling +2

Paper
Code

Online Job Scheduling in Distributed Machine Learning Clusters

no code implementations • 3 Jan 2018 • Yixin Bao, Yanghua Peng, Chuan Wu, Zongpeng Li

In a shared cluster handling multiple training jobs, a fundamental issue is how to efficiently schedule jobs and set the number of concurrent workers to run for each job, such that server resources are maximally utilized and model training can be completed in time.

Distributed, Parallel, and Cluster Computing

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.