Search Results for author: Xiangyu Jiang

Found 1 papers, 0 papers with code

SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models

no code implementations29 Oct 2023 Zhixu Du, Shiyu Li, Yuhao Wu, Xiangyu Jiang, Jingwei Sun, Qilin Zheng, Yongkai Wu, Ang Li, Hai "Helen" Li, Yiran Chen

Specifically, SiDA attains a remarkable speedup in MoE inference with up to 3. 93X throughput increasing, up to 75% latency reduction, and up to 80% GPU memory saving with down to 1% performance drop.

Cannot find the paper you are looking for? You can Submit a new open access paper.