Natural Language Processing • Attention Mechanisms • 8 methods
The original self-attention component in the Transformer architecture has a $O\left(n^{2}\right)$ time and memory complexity where $n$ is the input sequence length and thus, is not efficient to scale to long inputs. Attention pattern methods look to reduce this complexity by looking at a subset of the space.
Method | Year | Papers |
---|---|---|
2019 | 603 | |
2019 | 602 | |
2020 | 61 | |
2020 | 60 | |
2020 | 59 | |
2020 | 12 | |
2022 | 8 | |
2020 | 4 |