Search Results for author: Yongwei Wu

Found 2 papers, 1 papers with code

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

1 code implementation24 Jun 2024 Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, Xinran Xu

Compared to the baseline method, Mooncake can achieve up to a 525% increase in throughput in certain simulated scenarios while adhering to SLOs.

Cannot find the paper you are looking for? You can Submit a new open access paper.