no code implementations • 20 Aug 2024 • Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum
Next, we reconstruct a dense model featuring a pruning-friendly weight distribution by reactivating pruned connections with sparse regularization.
no code implementations • 19 Jun 2024 • Zeping Li, Xinlong Yang, Ziheng Gao, Ji Liu, Guanchen Li, Zhuang Liu, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum
On MT-Bench, Amphista delivers up to 2. 75$\times$ speedup over vanilla autoregressive decoding and 1. 40$\times$ over Medusa on Vicuna 33B in wall-clock time.