no code implementations • 26 Feb 2025 • Long Cheng, Qichen Liao, Fan Wu, Junlin Mu, Tengfei Han, Zhe Qiu, Lianqiang Li, Tianyi Liu, Fangzheng Miao, Keming Gao, Liang Wang, Zhen Zhang, Qiande Yin
To accelerate this process, we developed a low-precision, mathematically-equivalent algorithm called PASA, based on Flash Attention.