Search Results for author: Longteng Zhang

Found 4 papers, 1 papers with code

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

no code implementations7 Nov 2023 Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu

For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs.

Quantization

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

no code implementations3 Sep 2023 Zhenheng Tang, Yuxin Wang, Xin He, Longteng Zhang, Xinglin Pan, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Bingsheng He, Xiaowen Chu

The rapid growth of memory and computation requirements of large language models (LLMs) has outpaced the development of hardware, hindering people who lack large-scale high-end GPUs from training or deploying LLMs.

Scheduling

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

no code implementations7 Aug 2023 Longteng Zhang, Lin Zhang, Shaohuai Shi, Xiaowen Chu, Bo Li

The low-rank adaptation (LoRA) method can largely reduce the amount of trainable parameters for fine-tuning large language models (LLMs), however, it still requires expensive activation memory to update low-rank weights.

Evaluation and Optimization of Gradient Compression for Distributed Deep Learning

1 code implementation15 Jun 2023 Lin Zhang, Longteng Zhang, Shaohuai Shi, Xiaowen Chu, Bo Li

To accelerate distributed training, many gradient compression methods have been proposed to alleviate the communication bottleneck in synchronous stochastic gradient descent (S-SGD), but their efficacy in real-world applications still remains unclear.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.