Search Results for author: Kaifu Zheng

Found 1 papers, 1 papers with code

SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM

2 code implementations6 Dec 2023 Jiayi Pan, Chengcan Wang, Kaifu Zheng, Yangguang Li, Zhenyu Wang, Bin Feng

Our results show that, with SmoothQuant+, the Code Llama-34B model can be quantized and deployed on a A100 40GB GPU, achieving lossless accuracy and a throughput increase of 1. 9 to 4. 0 times compared to the FP16 model deployed on two A100 40GB GPUs.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.