no code implementations • 14 Apr 2024 • Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar
While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs.