Search Results for author: Xiangyu Jiang

Found 1 papers, 1 papers with code

SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models

1 code implementation • 29 Oct 2023 • Zhixu Du, Shiyu Li, Yuhao Wu, Xiangyu Jiang, Jingwei Sun, Qilin Zheng, Yongkai Wu, Ang Li, Hai "Helen" Li, Yiran Chen

Specifically, SiDA-MoE attains a remarkable speedup in MoE inference with up to $3. 93\times$ throughput increasing, up to $72\%$ latency reduction, and up to $80\%$ GPU memory saving with down to $1\%$ performance drop.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.