2 code implementations • 6 Dec 2023 • Jiayi Pan, Chengcan Wang, Kaifu Zheng, Yangguang Li, Zhenyu Wang, Bin Feng
Our results show that, with SmoothQuant+, the Code Llama-34B model can be quantized and deployed on a A100 40GB GPU, achieving lossless accuracy and a throughput increase of 1. 9 to 4. 0 times compared to the FP16 model deployed on two A100 40GB GPUs.