Search Results for author: Ze Tao

Found 1 papers, 1 papers with code

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

1 code implementation • 23 Feb 2024 • Lu Ye, Ze Tao, Yong Huang, Yang Li

In this paper, we introduce ChunkAttention, a prefix-aware self-attention module that can detect matching prompt prefixes across multiple requests and share their key/value tensors in memory at runtime to improve the memory utilization of KV cache.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.