no code implementations • 1 Sep 2022 • Amir Yazdanbakhsh, Ashkan Moradifirouzabadi, Zheng Li, Mingu Kang
The combined in-memory pruning and on-chip recompute of the relevant attention scores enables SPRINT to transform quadratic complexity to a merely linear one.