Search Results for author: Zekun Yin

Found 1 papers, 0 papers with code

FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

no code implementations22 Oct 2024 Haoran Lin, Xianzhi Yu, Kang Zhao, Lu Hou, Zongyuan Zhan, Stanislav Kamenev, Han Bao, Ting Hu, Mingkai Wang, Qixin Chang, Siyue Sui, Weihao Sun, Jiaxin Hu, Jun Yao, Zekun Yin, Cheng Qian, Ying Zhang, Yinfei Pan, Yu Yang, Weiguo Liu

In this work, we propose FastAttention which pioneers the adaptation of FlashAttention series for NPUs and low-resource GPUs to boost LLM inference efficiency.

Cannot find the paper you are looking for? You can Submit a new open access paper.