Autoregressive Transformers

Primer is a Transformer-based architecture that improves upon the Transformer architecture with two improvements found through neural architecture search: squared RELU activations in the feedforward block, and depthwise convolutions added to the attention multi-head projections: resulting in a new module called Multi-DConv-Head-Attention.

Source: Primer: Searching for Efficient Transformers for Language Modeling

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modeling 4 18.18%
Language Modelling 4 18.18%
Safety Alignment 1 4.55%
TAR 1 4.55%
Epidemiology 1 4.55%
Protein Structure Prediction 1 4.55%
Sentiment Analysis 1 4.55%
Diversity 1 4.55%
Common Sense Reasoning 1 4.55%

Categories