Primer is a Transformer-based architecture that improves upon the Transformer architecture with two improvements found through neural architecture search: squared RELU activations in the feedforward block, and depthwise convolutions added to the attention multi-head projections: resulting in a new module called Multi-DConv-Head-Attention.
Source: Primer: Searching for Efficient Transformers for Language ModelingPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 4 | 33.33% |
Sentiment Analysis | 1 | 8.33% |
Common Sense Reasoning | 1 | 8.33% |
Coreference Resolution | 1 | 8.33% |
Natural Language Inference | 1 | 8.33% |
Question Answering | 1 | 8.33% |
Text Classification | 1 | 8.33% |
Word Sense Disambiguation | 1 | 8.33% |
Specificity | 1 | 8.33% |
Component | Type |
|
---|---|---|
Dense Connections
|
Feedforward Networks | |
Multi-DConv-Head Attention
|
Attention Modules | |
Squared ReLU
|
Activation Functions |