Search Results for author: Yutao Xu

Efficient LLM inference solution on Intel GPU

A customized Scaled-Dot-Product-Attention kernel is designed to match our fusion policy based on the segment KV cache solution.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.