Search Results for author: Haibin Lin

Found 20 papers, 13 papers with code

HybridFlow: A Flexible and Efficient RLHF Framework

no code implementations28 Sep 2024 Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, Chuan Wu

Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs.

Large Language Model

Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

no code implementations7 Aug 2024 Weiqi Feng, Yangrui Chen, Shaoyu Wang, Yanghua Peng, Haibin Lin, Minlan Yu

Multimodal large language models (MLLMs) have extended the success of large language models (LLMs) to multiple data types, such as image, text and audio, achieving significant performance in various domains, including multimodal translation, visual question answering and content generation.

Question Answering Scheduling +1

ByteCheckpoint: A Unified Checkpointing System for LLM Development

no code implementations29 Jul 2024 Borui Wan, Mingji Han, Yiyao Sheng, Zhichao Lai, Mofan Zhang, Junda Zhang, Yanghua Peng, Haibin Lin, Xin Liu, Chuan Wu

Besides, when transferring checkpoints across tasks, checkpoint resharding, defined as loading checkpoints into parallel configurations differing from those used for saving, is often required according to the characteristics and resource quota of specific tasks.

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

1 code implementation2 Jul 2024 Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu

A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling.

Quantization

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

no code implementations11 Jun 2024 Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Ziheng Jiang, Haibin Lin, Xin Jin, Xin Liu

Overall, it can achieve up to 1. 24x speedups for training over Megatron-LM on a cluster of 128 GPUs with various GPU generations and interconnects, and up to 1. 66x and 1. 30x speedups for prefill and decoding inference over vLLM on a cluster with 8 GPUs with various GPU generations and interconnects.

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

1 code implementation16 Nov 2023 Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu

Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.

Domain Adaptation

LEMON: Lossless model expansion

1 code implementation12 Oct 2023 Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang

Our empirical results demonstrate that LEMON reduces computational costs by 56. 7% for Vision Transformers and 33. 2% for BERT when compared to training from scratch.

ByteComp: Revisiting Gradient Compression in Distributed Training

no code implementations28 May 2022 Zhuang Wang, Haibin Lin, Yibo Zhu, T. S. Eugene Ng

It first designs a decision tree abstraction to express all the compression strategies and develops empirical models to timeline tensor computation, communication, and compression to enable ByteComp to derive the intricate interactions among tensors.

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

no code implementations5 May 2022 Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo

Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.

Compressed Communication for Distributed Training: Adaptive Methods and System

1 code implementation17 May 2021 Yuchen Zhong, Cong Xie, Shuai Zheng, Haibin Lin

Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.

CSER: Communication-efficient SGD with Error Reset

no code implementations NeurIPS 2020 Cong Xie, Shuai Zheng, Oluwasanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks.

Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes

1 code implementation24 Jun 2020 Shuai Zheng, Haibin Lin, Sheng Zha, Mu Li

Using the proposed LANS method and the learning rate scheme, we scaled up the mini-batch sizes to 96K and 33K in phases 1 and 2 of BERT pretraining, respectively.

Natural Language Understanding

Is Network the Bottleneck of Distributed Training?

1 code implementation17 Jun 2020 Zhen Zhang, Chaokun Chang, Haibin Lin, Yida Wang, Raman Arora, Xin Jin

As such, we advocate that the real challenge of distributed training is for the network community to develop high-performance network transport to fully utilize the network capacity and achieve linear scale-out.

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

3 code implementations9 Jul 2019 Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

2 code implementations26 Apr 2019 Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li

One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes.

Image Classification object-detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.