Search Results for author: Zhiying Wu

Found 4 papers, 3 papers with code

INT-FlashAttention: Enabling Flash Attention for INT8 Quantization

1 code implementation25 Sep 2024 Shimao Chen, Zirui Liu, Zhiying Wu, Ce Zheng, Peizhuang Cong, Zihan Jiang, Yuhan Wu, Lei Su, Tong Yang

As the foundation of large language models (LLMs), self-attention module faces the challenge of quadratic time and memory complexity with respect to sequence length.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.