1 code implementation • 25 Jun 2023 • Marcelo Archanjo Jose, Fabio Gagliardi Cozman
Long sequences of text are challenging in the context of transformers, due to quadratic memory increase in the self-attention mechanism.