Search Results for author: Zaoxing Liu

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference.

Paper
Code

Federated learning methods run training tasks directly on user devices and do not share the raw user data with third parties.

Paper
Add Code

Many existing works treat these concerns separately.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.