RealFormer is a type of Transformer based on the idea of residual attention. It adds skip edges to the backbone Transformer to create multiple direct paths, one for each type of attention module. It adds no parameters or hyper-parameters. Specifically, RealFormer uses a Post-LN style Transformer as backbone and adds skip edges to connect Multi-Head Attention modules in adjacent layers.
Source: RealFormer: Transformer Likes Residual AttentionPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 1 | 11.11% |
Linguistic Acceptability | 1 | 11.11% |
Machine Translation | 1 | 11.11% |
Natural Language Inference | 1 | 11.11% |
Natural Questions | 1 | 11.11% |
Paraphrase Identification | 1 | 11.11% |
Semantic Textual Similarity | 1 | 11.11% |
Sentiment Analysis | 1 | 11.11% |
Translation | 1 | 11.11% |