no code implementations • 25 Jan 2025 • Xingyang He, Jie Liu, Shaowei Chen
To address this issue, we propose Task-KV, a method that leverages the semantic differentiation of attention heads to allocate differentiated KV cache budgets across various tasks.