no code implementations • 2 Jun 2024 • Aozhong zhang, Naigang Wang, Yanxia Deng, Xin Li, Zi Yang, Penghang Yin
For example, we achieve a Wikitext2 perplexity of 5. 95 on the LLaMA2-70B model for per-channel INT2 weight quantization without incurring any inference overhead.