no code implementations • 20 Aug 2024 • Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum
Next, we reconstruct a dense model featuring a pruning-friendly weight distribution by reactivating pruned connections with sparse regularization.
no code implementations • 19 Jun 2024 • Zeping Li, Xinlong Yang, Ziheng Gao, Ji Liu, Guanchen Li, Zhuang Liu, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum
On MT-Bench, Amphista delivers up to 2. 75$\times$ speedup over vanilla autoregressive decoding and 1. 40$\times$ over Medusa on Vicuna 33B in wall-clock time.
no code implementations • 7 Apr 2024 • YuHang Zhou, Zeping Li, Siyu Tian, Yuchen Ni, Sen Liu, Guangnan Ye, Hongfeng Chai
Large language models (LLMs) are increasingly being applied across various specialized fields, leveraging their extensive knowledge to empower a multitude of scenarios within these domains.