no code implementations • 13 Jun 2024 • Ping Chen, Wenjie Zhang, Shuibing He, Yingjie Gu, Zhuwei Peng, Kexin Huang, Xuan Zhan, Weijian Chen, Yi Zheng, Zhefeng Wang, Yanlong Yin, Gang Chen
Our comprehensive evaluation using GPT models with 1. 3B-20B parameters shows that both OPT and HEU outperform the state-of-the-art recomputation approaches (e. g., Megatron-LM and Checkmake) by 1. 02-1. 53x.