Search Results for author: Guanchen Li

Found 2 papers, 0 papers with code

Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

no code implementations20 Aug 2024 Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum

Next, we reconstruct a dense model featuring a pruning-friendly weight distribution by reactivating pruned connections with sparse regularization.

Amphista: Bi-directional Multi-head Decoding for Accelerating LLM Inference

no code implementations19 Jun 2024 Zeping Li, Xinlong Yang, Ziheng Gao, Ji Liu, Guanchen Li, Zhuang Liu, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum

On MT-Bench, Amphista delivers up to 2. 75$\times$ speedup over vanilla autoregressive decoding and 1. 40$\times$ over Medusa on Vicuna 33B in wall-clock time.

Cannot find the paper you are looking for? You can Submit a new open access paper.