Search Results for author: Kaifu Zheng

Found 1 papers, 1 papers with code

SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM

2 code implementations • 6 Dec 2023 • Jiayi Pan, Chengcan Wang, Kaifu Zheng, Yangguang Li, Zhenyu Wang, Bin Feng

Our results show that, with SmoothQuant+, the Code Llama-34B model can be quantized and deployed on a A100 40GB GPU, achieving lossless accuracy and a throughput increase of 1. 9 to 4. 0 times compared to the FP16 model deployed on two A100 40GB GPUs.

Quantization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.