A Transformer is a model architecture that eschews recurrence and instead relies entirely on an attention mechanism to draw global dependencies between input and output. Before Transformers, the dominant sequence transduction models were based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The Transformer also employs an encoder and decoder, but removing recurrence in favor of attention mechanisms allows for significantly more parallelization than methods like RNNs and CNNs.
Source: Attention Is All You NeedPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 37 | 4.95% |
Diversity | 18 | 2.41% |
Image Classification | 18 | 2.41% |
Decoder | 18 | 2.41% |
Large Language Model | 17 | 2.28% |
Decision Making | 16 | 2.14% |
In-Context Learning | 16 | 2.14% |
Retrieval | 15 | 2.01% |
Object Detection | 14 | 1.87% |