no code implementations • 24 Oct 2023 • Yun Li, Lin Niu, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang
Traditional pruning methods are known to be challenging to work in Large Language Models (LLMs) for Generative AI because of their unaffordable training process and large computational demands.
no code implementations • 25 Mar 2022 • Hanlin Tang, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang
In this work, we propose MKQ-BERT, which further improves the compression level and uses 4-bits for quantization.