1 code implementation • 24 Jun 2024 • Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, Xinran Xu
Compared to the baseline method, Mooncake can achieve up to a 525% increase in throughput in certain simulated scenarios while adhering to SLOs.
no code implementations • 3 May 2024 • Shaoyuan Chen, Wencong Xiao, Yutong Lin, Mingxing Zhang, Yingdi Shan, Jinlei Jiang, Kang Chen, Yongwei Wu
To enhance the efficiency of LLM decoding, we introduce model-attention disaggregation.