Natural Language Processing • Attention Mechanisms • 7 methods
The original self-attention component in the Transformer architecture has a $O\left(n^{2}\right)$ time and memory complexity where $n$ is the input sequence length and thus, is not efficient to scale to long inputs. Attention pattern methods look to reduce this complexity by looking at a subset of the space.
Method | Year | Papers |
---|---|---|
2019 | 163 | |
2019 | 163 | |
2020 | 34 | |
2020 | 33 | |
2020 | 33 | |
2020 | 8 | |
2020 | 3 |