TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	RealFormer	Accuracy	59.83%	# 31
Semantic Textual Similarity	MRPC	RealFormer	Accuracy	87.01%	# 30
Semantic Textual Similarity	MRPC	RealFormer	F1	90.91%	# 9
Natural Language Inference	MultiNLI	RealFormer	Matched	86.28	# 24
Natural Language Inference	MultiNLI	RealFormer	Mismatched	86.34	# 14
Natural Language Inference	QNLI	RealFormer	Accuracy	91.89%	# 27
Paraphrase Identification	Quora Question Pairs	RealFormer	Accuracy	91.34	# 3
Paraphrase Identification	Quora Question Pairs	RealFormer	F1	88.28	# 4
Natural Language Inference	RTE	RealFormer	Accuracy	73.7%	# 47
Sentiment Analysis	SST-2 Binary classification	RealFormer	Accuracy	94.04	# 36
Semantic Textual Similarity	STS Benchmark	RealFormer	Pearson Correlation	0.9011	# 19
Semantic Textual Similarity	STS Benchmark	RealFormer	Spearman Correlation	0.8988	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/informer-transformer-likes-informed-attention/paraphrase-identification-on-quora-question)](https://paperswithcode.com/sota/paraphrase-identification-on-quora-question?p=informer-transformer-likes-informed-attention)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/informer-transformer-likes-informed-attention/semantic-textual-similarity-on-sts-benchmark)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark?p=informer-transformer-likes-informed-attention)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/informer-transformer-likes-informed-attention/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=informer-transformer-likes-informed-attention)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/informer-transformer-likes-informed-attention/natural-language-inference-on-qnli)](https://paperswithcode.com/sota/natural-language-inference-on-qnli?p=informer-transformer-likes-informed-attention)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/informer-transformer-likes-informed-attention/semantic-textual-similarity-on-mrpc)](https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc?p=informer-transformer-likes-informed-attention)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/informer-transformer-likes-informed-attention/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=informer-transformer-likes-informed-attention)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/informer-transformer-likes-informed-attention/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=informer-transformer-likes-informed-attention)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/informer-transformer-likes-informed-attention/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=informer-transformer-likes-informed-attention)`

RealFormer: Transformer Likes Residual Attention

Findings (ACL) 2021 · Ruining He, Anirudh Ravula, Bhargav Kanagal, Joshua Ainslie ·

Transformer is the backbone of modern NLP models. In this paper, we propose RealFormer, a simple and generic technique to create Residual Attention Layer Transformer networks that significantly outperform the canonical Transformer and its variants (BERT, ETC, etc.) on a wide spectrum of tasks including Masked Language Modeling, GLUE, SQuAD, Neural Machine Translation, WikiHop, HotpotQA, Natural Questions, and OpenKP. We also observe empirically that RealFormer stabilizes training and leads to models with sparser attention. Source code and pre-trained checkpoints for RealFormer can be found at https://github.com/google-research/google-research/tree/master/realformer.

PDF Abstract Findings (ACL) 2021 PDF Findings (ACL) 2021 Abstract

Code

Add Remove Mark official

google-research/google-research official

32,783

cloneofsimo/RealFormer-pytorch

JunnYu/x-transformers-paddle

jaketae/realformer

aivolcano/BERT_MRC_CLS

Tasks

Add Remove

Language Modelling

Linguistic Acceptability

Machine Translation

Masked Language Modeling

Natural Language Inference

Natural Questions

Paraphrase Identification

Semantic Textual Similarity

Sentiment Analysis

Translation

Datasets

GLUE

SST

SQuAD

MultiNLI SST-2

QNLI

Natural Questions

MRPC

CoLA

HotpotQA

Quora

WikiHop

Quora Question Pairs RTE STS Benchmark

Results from the Paper

Edit

Ranked #4 on Paraphrase Identification on Quora Question Pairs

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Linguistic Acceptability	CoLA	RealFormer	Accuracy	59.83%	# 31	Compare
Semantic Textual Similarity	MRPC	RealFormer	Accuracy	87.01%	# 30	Compare
Semantic Textual Similarity	MRPC	RealFormer	F1	90.91%	# 9	Compare
Natural Language Inference	MultiNLI	RealFormer	Matched	86.28	# 24	Compare
Natural Language Inference	MultiNLI	RealFormer	Mismatched	86.34	# 14	Compare
Natural Language Inference	QNLI	RealFormer	Accuracy	91.89%	# 27	Compare
Paraphrase Identification	Quora Question Pairs	RealFormer	Accuracy	91.34	# 3	Compare
Paraphrase Identification	Quora Question Pairs	RealFormer	F1	88.28	# 4	Compare
Natural Language Inference	RTE	RealFormer	Accuracy	73.7%	# 47	Compare
Sentiment Analysis	SST-2 Binary classification	RealFormer	Accuracy	94.04	# 36	Compare
Semantic Textual Similarity	STS Benchmark	RealFormer	Pearson Correlation	0.9011	# 19	Compare
Semantic Textual Similarity	STS Benchmark	RealFormer	Spearman Correlation	0.8988	# 5	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Contrastive Predictive Coding • Dense Connections • Dropout • ETC • Global-Local Attention • InfoNCE • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • RealFormer • Relative Position Encodings • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

RealFormer: Transformer Likes Residual Attention

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove