Search Results for author: Yutao Xu

Found 1 papers, 0 papers with code

Efficient LLM inference solution on Intel GPU

no code implementations19 Dec 2023 Hui Wu, Yi Gan, Feng Yuan, Jing Ma, Wei Zhu, Yutao Xu, Hong Zhu, Yuhua Zhu, Xiaoli Liu, Jinghui Gu

A customized Scaled-Dot-Product-Attention kernel is designed to match our fusion policy based on the segment KV cache solution.

Management

Cannot find the paper you are looking for? You can Submit a new open access paper.