Search Results for author: Yanqi Zhang

Found 5 papers, 0 papers with code

Unifying KV Cache Compression for Large Language Models with LeanKV

no code implementations4 Dec 2024 Yanqi Zhang, Yuwei Hu, Runyuan Zhao, John C. S. Lui, Haibo Chen

Large language models (LLMs) demonstrate exceptional performance but incur high serving costs due to substantial memory demands, with the key-value (KV) cache being a primary bottleneck.

Quantization

Sinan: Data-Driven, QoS-Aware Cluster Management for Microservices

no code implementations27 May 2021 Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, Edward Suh, Christina Delimitrou

Cloud applications are increasingly shifting from large monolithic services, to large numbers of loosely-coupled, specialized microservices.

Management

Leveraging Deep Learning to Improve the Performance Predictability of Cloud Microservices

no code implementations2 May 2019 Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, Christina Delimitrou

We show that Seer correctly anticipates QoS violations 91% of the time, and avoids the QoS violation to begin with in 84% of cases.

Cannot find the paper you are looking for? You can Submit a new open access paper.