no code implementations • 24 Oct 2023 • Yun Li, Lin Niu, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang
Traditional pruning methods are known to be challenging to work in Large Language Models (LLMs) for Generative AI because of their unaffordable training process and large computational demands.
no code implementations • 19 Apr 2023 • Lin Niu, Jiawei Liu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu
PTQ optimizes the quantization parameters by different metrics to minimize the perturbation of quantization.
1 code implementation • 3 Apr 2023 • Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu
In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers.
1 code implementation • CVPR 2023 • Jiawei Liu, Lin Niu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu
It determines the quantization parameters by using the information of differences between network prediction before and after quantization.