Search Results for author: Yaoxiu Lian

Found 1 papers, 0 papers with code

Enabling Fast 2-bit LLM on GPUs: Memory Alignment and Asynchronous Dequantization

no code implementations • 28 Nov 2023 • Jinhao Li, Shiyao Li, Jiaming Xu, Shan Huang, Yaoxiu Lian, Jun Liu, Yu Wang, Guohao Dai

Weights are quantized by groups, while the ranges of weights are large in some groups, resulting in large quantization errors and nonnegligible accuracy loss (e. g. >3% for Llama2-7b with 2-bit quantization in GPTQ and Greenbit).

Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.