Search Results for author: Haibin Lin

Found 15 papers, 11 papers with code

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

1 code implementation • 2 Mar 2024 • Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Chuan Wu

The immense sizes of LLMs have led to very high resource demand and cost for running the models.

Paper
Code

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

1 code implementation • 23 Feb 2024 • Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu

Training LLMs at this scale brings unprecedented challenges to training efficiency and stability.

Language Modelling Large Language Model

343

Paper
Code

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

1 code implementation • 16 Nov 2023 • Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu

Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.

Domain Adaptation

Paper
Code

LEMON: Lossless model expansion

no code implementations • 12 Oct 2023 • Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang

Our empirical results demonstrate that LEMON reduces computational costs by 56. 7% for Vision Transformers and 33. 2% for BERT when compared to training from scratch.

Paper
Add Code

ByteComp: Revisiting Gradient Compression in Distributed Training

no code implementations • 28 May 2022 • Zhuang Wang, Haibin Lin, Yibo Zhu, T. S. Eugene Ng

It first designs a decision tree abstraction to express all the compression strategies and develops empirical models to timeline tensor computation, communication, and compression to enable ByteComp to derive the intricate interactions among tensors.

Paper
Add Code

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

no code implementations • 5 May 2022 • Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo

Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.

Paper
Add Code

Compressed Communication for Distributed Training: Adaptive Methods and System

1 code implementation • 17 May 2021 • Yuchen Zhong, Cong Xie, Shuai Zheng, Haibin Lin

Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.

Paper
Code

CSER: Communication-efficient SGD with Error Reset

no code implementations • NeurIPS 2020 • Cong Xie, Shuai Zheng, Oluwasanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks.

Paper
Add Code

Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes

1 code implementation • 24 Jun 2020 • Shuai Zheng, Haibin Lin, Sheng Zha, Mu Li

Using the proposed LANS method and the learning rate scheme, we scaled up the mini-batch sizes to 96K and 33K in phases 1 and 2 of BERT pretraining, respectively.

Natural Language Understanding

Paper
Code

Is Network the Bottleneck of Distributed Training?

1 code implementation • 17 Jun 2020 • Zhen Zhang, Chaokun Chang, Haibin Lin, Yida Wang, Raman Arora, Xin Jin

As such, we advocate that the real challenge of distributed training is for the network community to develop high-performance network transport to fully utilize the network capacity and achieve linear scale-out.

Paper
Code

ResNeSt: Split-Attention Networks

35 code implementations • 19 Apr 2020 • Hang Zhang, Chongruo wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola

It is well known that featuremap attention and multi-path representation are important for visual recognition.

Ranked #8 on Instance Segmentation on COCO test-dev (APM metric)

Image Classification Instance Segmentation +3

29,758

Paper
Code

Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

1 code implementation • 20 Nov 2019 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta, Haibin Lin

When scaling distributed training, the communication overhead is often the bottleneck.

Paper
Code

VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results

1 code implementation • International Conference on Computer Vision Workshops 2019 • Dawei Du, Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Lin, QinGhua Hu, Tao Peng, Jiayu Zheng, Xinyao Wang, Yue Zhang, Liefeng Bo, Hailin Shi, Rui Zhu, Aashish Kumar, Aijin Li, Almaz Zinollayev, Anuar Askergaliyev, Arne Schumann, Binjie Mao, Byeongwon Lee, Chang Liu, Changrui Chen, Chunhong Pan, Chunlei Huo, Da Yu, Dechun Cong, Dening Zeng, Dheeraj Reddy Pailla, Di Li, Dong Wang, Donghyeon Cho, Dongyu Zhang, Furui Bai, George Jose, Guangyu Gao, Guizhong Liu, Haitao Xiong, Hao Qi, Haoran Wang, Heqian Qiu, Hongliang Li, Huchuan Lu, Ildoo Kim, Jaekyum Kim, Jane Shen, Jihoon Lee, Jing Ge, Jingjing Xu, Jingkai Zhou, Jonas Meier, Jun Won Choi, Junhao Hu, Junyi Zhang, Junying Huang, Kaiqi Huang, Keyang Wang, Lars Sommer, Lei Jin, Lei Zhang

Results of 33 object detection algorithms are presented.

Object object-detection +1

12,059

Paper
Code

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

4 code implementations • 9 Jul 2019 • Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).

2,548

Paper
Code

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

2 code implementations • 26 Apr 2019 • Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li

One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes.

Image Classification object-detection +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.