no code implementations • 28 Feb 2024 • Yi Zhang, Fei Yang, Shuang Peng, Fangyu Wang, Aimin Pan
The 4-bit matrix multiplication introduced in the FlattenQuant method can effectively address the compute-bound caused by large matrix calculation.
no code implementations • 6 Dec 2023 • Fei Yang, Shuang Peng, Ning Sun, Fangyu Wang, Yuanyuan Wang, Fu Wu, Jiezhong Qiu, Aimin Pan
Large language models (LLMs) such as GPT-3, OPT, and LLaMA have demonstrated remarkable accuracy in a wide range of tasks.