1 code implementation • 17 Dec 2024 • Mingjia Shi, Yuhao Zhou, Ruiji Yu, Zekai Li, Zhiyuan Liang, Xuanlei Zhao, Xiaojiang Peng, Tanmay Rajpurohit, Shanmukha Ramakrishna Vedantam, Wangbo Zhao, Kai Wang, Yang You
Re-training the token-reduced model enhances the performance of Mamba, by effectively rebuilding the key knowledge.