TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Machine Reading Comprehension	DREAM	ASA + RoBERTa	Accuracy	69.2	# 1
Machine Reading Comprehension	DREAM	ASA + BERT-base	Accuracy	64.3	# 3
Natural Language Inference	MultiNLI	ASA + RoBERTa	Matched	88	# 15
Natural Language Inference	MultiNLI	ASA + BERT-base	Matched	85	# 28
Natural Language Inference	QNLI	ASA + BERT-base	Accuracy	91.4%	# 28
Natural Language Inference	QNLI	ASA + RoBERTa	Accuracy	93.6%	# 20
Paraphrase Identification	Quora Question Pairs	ASA + RoBERTa	F1	73.7	# 8
Paraphrase Identification	Quora Question Pairs	ASA + BERT-base	F1	72.3	# 11
Sentiment Analysis	SST-2 Binary classification	ASA + BERT-base	Accuracy	94.1	# 35
Sentiment Analysis	SST-2 Binary classification	ASA + RoBERTa	Accuracy	96.3	# 17
Semantic Textual Similarity	STS Benchmark	ASA + RoBERTa	Spearman Correlation	0.892	# 8
Semantic Textual Similarity	STS Benchmark	ASA + BERT-base	Spearman Correlation	0.865	# 20
Named Entity Recognition (NER)	WNUT 2017	ASA + RoBERTa	F1	57.3	# 6
Named Entity Recognition (NER)	WNUT 2017	ASA + BERT-base	F1	49.8	# 15

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adversarial-self-attention-for-language/machine-reading-comprehension-on-dream)](https://paperswithcode.com/sota/machine-reading-comprehension-on-dream?p=adversarial-self-attention-for-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adversarial-self-attention-for-language/named-entity-recognition-on-wnut-2017)](https://paperswithcode.com/sota/named-entity-recognition-on-wnut-2017?p=adversarial-self-attention-for-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adversarial-self-attention-for-language/paraphrase-identification-on-quora-question)](https://paperswithcode.com/sota/paraphrase-identification-on-quora-question?p=adversarial-self-attention-for-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adversarial-self-attention-for-language/semantic-textual-similarity-on-sts-benchmark)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark?p=adversarial-self-attention-for-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adversarial-self-attention-for-language/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=adversarial-self-attention-for-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adversarial-self-attention-for-language/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=adversarial-self-attention-for-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/adversarial-self-attention-for-language/natural-language-inference-on-qnli)](https://paperswithcode.com/sota/natural-language-inference-on-qnli?p=adversarial-self-attention-for-language)`

Adversarial Self-Attention for Language Understanding

25 Jun 2022 · Hongqiu Wu, Ruixue Ding, Hai Zhao, Pengjun Xie, Fei Huang, Min Zhang ·

Deep neural models (e.g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness. This paper advances the self-attention mechanism to its robust variant for Transformer-based pre-trained language models (e.g. BERT). We propose \textit{Adversarial Self-Attention} mechanism (ASA), which adversarially biases the attentions to effectively suppress the model reliance on features (e.g. specific keywords) and encourage its exploration of broader semantics. We conduct a comprehensive evaluation across a wide range of tasks for both pre-training and fine-tuning stages. For pre-training, ASA unfolds remarkable performance gains compared to naive training for longer steps. For fine-tuning, ASA-empowered models outweigh naive models by a large margin considering both generalization and robustness.

PDF Abstract

Code

Add Remove Mark official

gingasan/adversarialsa official

Tasks

Add Remove

Machine Reading Comprehension

Named Entity Recognition (NER)

Natural Language Inference

Paraphrase Identification

Semantic Similarity

Semantic Textual Similarity

Sentiment Analysis

Datasets

GLUE

SST

MultiNLI SST-2

QNLI

HellaSwag

ANLI WNUT 2017

DREAM

Quora Question Pairs STS Benchmark

Results from the Paper

Edit

Ranked #1 on Machine Reading Comprehension on DREAM

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Machine Reading Comprehension	DREAM	ASA + RoBERTa	Accuracy	69.2	# 1	Compare
Machine Reading Comprehension	DREAM	ASA + BERT-base	Accuracy	64.3	# 3	Compare
Natural Language Inference	MultiNLI	ASA + RoBERTa	Matched	88	# 15	Compare
Natural Language Inference	MultiNLI	ASA + BERT-base	Matched	85	# 28	Compare
Natural Language Inference	QNLI	ASA + BERT-base	Accuracy	91.4%	# 28	Compare
Natural Language Inference	QNLI	ASA + RoBERTa	Accuracy	93.6%	# 20	Compare
Paraphrase Identification	Quora Question Pairs	ASA + RoBERTa	F1	73.7	# 8	Compare
Paraphrase Identification	Quora Question Pairs	ASA + BERT-base	F1	72.3	# 11	Compare
Sentiment Analysis	SST-2 Binary classification	ASA + BERT-base	Accuracy	94.1	# 35	Compare
Sentiment Analysis	SST-2 Binary classification	ASA + RoBERTa	Accuracy	96.3	# 17	Compare
Semantic Textual Similarity	STS Benchmark	ASA + RoBERTa	Spearman Correlation	0.892	# 8	Compare
Semantic Textual Similarity	STS Benchmark	ASA + BERT-base	Spearman Correlation	0.865	# 20	Compare
Named Entity Recognition (NER)	WNUT 2017	ASA + RoBERTa	F1	57.3	# 6	Compare
Named Entity Recognition (NER)	WNUT 2017	ASA + BERT-base	F1	49.8	# 15	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Adversarial Self-Attention for Language Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove