no code implementations • 27 Mar 2025 • Yujie Chen, Haotong Qin, Zhang Zhang, Michelo Magno, Luca Benini, Yawei Li
While low-bit quantization is an efficient model compression strategy for reducing size and accelerating IR tasks, SSM suffers substantial performance drops at ultra-low bit-widths (2-4 bits), primarily due to outliers that exacerbate quantization error.