Search Results for author: Haibin Lin

Found 15 papers, 11 papers with code

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

1 code implementation16 Nov 2023 Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu

Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.

Domain Adaptation

LEMON: Lossless model expansion

no code implementations12 Oct 2023 Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang

Our empirical results demonstrate that LEMON reduces computational costs by 56. 7% for Vision Transformers and 33. 2% for BERT when compared to training from scratch.

ByteComp: Revisiting Gradient Compression in Distributed Training

no code implementations28 May 2022 Zhuang Wang, Haibin Lin, Yibo Zhu, T. S. Eugene Ng

It first designs a decision tree abstraction to express all the compression strategies and develops empirical models to timeline tensor computation, communication, and compression to enable ByteComp to derive the intricate interactions among tensors.

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

no code implementations5 May 2022 Hanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu, Yibo Zhu, Haibin Lin, Chuanxiong Guo

Distributed training using multiple devices (e. g., GPUs) has been widely adopted for learning DNN models over large datasets.

Compressed Communication for Distributed Training: Adaptive Methods and System

1 code implementation17 May 2021 Yuchen Zhong, Cong Xie, Shuai Zheng, Haibin Lin

Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.

CSER: Communication-efficient SGD with Error Reset

no code implementations NeurIPS 2020 Cong Xie, Shuai Zheng, Oluwasanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks.

Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes

1 code implementation24 Jun 2020 Shuai Zheng, Haibin Lin, Sheng Zha, Mu Li

Using the proposed LANS method and the learning rate scheme, we scaled up the mini-batch sizes to 96K and 33K in phases 1 and 2 of BERT pretraining, respectively.

Natural Language Understanding

Is Network the Bottleneck of Distributed Training?

1 code implementation17 Jun 2020 Zhen Zhang, Chaokun Chang, Haibin Lin, Yida Wang, Raman Arora, Xin Jin

As such, we advocate that the real challenge of distributed training is for the network community to develop high-performance network transport to fully utilize the network capacity and achieve linear scale-out.

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

4 code implementations9 Jul 2019 Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

2 code implementations26 Apr 2019 Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li

One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes.

Image Classification object-detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.