Search Results for author: Xingyang He

Found 1 papers, 0 papers with code

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads

no code implementations25 Jan 2025 Xingyang He, Jie Liu, Shaowei Chen

To address this issue, we propose Task-KV, a method that leverages the semantic differentiation of attention heads to allocate differentiated KV cache budgets across various tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.