TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Textual Similarity	SICK	Mirror-BERT-base (unsup.)	Spearman Correlation	0.703	# 16
Semantic Textual Similarity	SICK	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.706	# 15
Semantic Textual Similarity	STS12	Mirror-BERT-base (unsup.)	Spearman Correlation	0.674	# 18
Semantic Textual Similarity	STS12	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.648	# 20
Semantic Textual Similarity	STS13	Mirror-BERT-base (unsup.)	Spearman Correlation	0.796	# 20
Semantic Textual Similarity	STS13	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.819	# 17
Semantic Textual Similarity	STS14	Mirror-BERT-base (unsup.)	Spearman Correlation	0.713	# 18
Semantic Textual Similarity	STS14	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.732	# 17
Semantic Textual Similarity	STS15	Mirror-BERT-base (unsup.)	Spearman Correlation	0.814	# 17
Semantic Textual Similarity	STS15	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.798	# 19
Semantic Textual Similarity	STS16	Mirror-BERT-base (unsup.)	Spearman Correlation	0.743	# 19
Semantic Textual Similarity	STS16	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.78	# 15
Semantic Textual Similarity	STS Benchmark	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.787	# 32
Semantic Textual Similarity	STS Benchmark	Mirror-BERT-base (unsup.)	Spearman Correlation	0.764	# 37

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fast-effective-and-self-supervised/semantic-textual-similarity-on-sick)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sick?p=fast-effective-and-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fast-effective-and-self-supervised/semantic-textual-similarity-on-sts16)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts16?p=fast-effective-and-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fast-effective-and-self-supervised/semantic-textual-similarity-on-sts13)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts13?p=fast-effective-and-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fast-effective-and-self-supervised/semantic-textual-similarity-on-sts14)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts14?p=fast-effective-and-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fast-effective-and-self-supervised/semantic-textual-similarity-on-sts15)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts15?p=fast-effective-and-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fast-effective-and-self-supervised/semantic-textual-similarity-on-sts12)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts12?p=fast-effective-and-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fast-effective-and-self-supervised/semantic-textual-similarity-on-sts-benchmark)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark?p=fast-effective-and-self-supervised)`

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

EMNLP 2021 · Fangyu Liu, Ivan Vulić, Anna Korhonen, Nigel Collier ·

Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. However, previous work has indicated that off-the-shelf MLMs are not effective as universal lexical or sentence encoders without further task-specific fine-tuning on NLI, sentence similarity, or paraphrasing tasks using annotated task data. In this work, we demonstrate that it is possible to turn MLMs into effective universal lexical and sentence encoders even without any additional data and without any supervision. We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT, which converts MLMs (e.g., BERT and RoBERTa) into such encoders in 20-30 seconds without any additional external knowledge. Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples, and aims to maximise their similarity during identity fine-tuning. We report huge gains over off-the-shelf MLMs with Mirror-BERT in both lexical-level and sentence-level tasks, across different domains and different languages. Notably, in the standard sentence semantic similarity (STS) tasks, our self-supervised Mirror-BERT model even matches the performance of the task-tuned Sentence-BERT models from prior work. Finally, we delve deeper into the inner workings of MLMs, and suggest some evidence on why this simple approach can yield effective universal lexical and sentence encoders.

PDF Abstract EMNLP 2021 PDF EMNLP 2021 Abstract

Code

Add Remove Mark official

cambridgeltl/mirror-bert official

Tasks

Add Remove

Contrastive Learning

Cross-Lingual Semantic Textual Similarity

Entity Linking

Semantic Similarity

Semantic Textual Similarity

Sentence

Sentence Similarity

STS

Datasets

GLUE

MultiNLI

SNLI

QNLI

SICK STS Benchmark COMETA

Results from the Paper

Edit

Ranked #15 on Semantic Textual Similarity on STS16

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Textual Similarity	SICK	Mirror-BERT-base (unsup.)	Spearman Correlation	0.703	# 16	Compare
Semantic Textual Similarity	SICK	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.706	# 15	Compare
Semantic Textual Similarity	STS12	Mirror-BERT-base (unsup.)	Spearman Correlation	0.674	# 18	Compare
Semantic Textual Similarity	STS12	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.648	# 20	Compare
Semantic Textual Similarity	STS13	Mirror-BERT-base (unsup.)	Spearman Correlation	0.796	# 20	Compare
Semantic Textual Similarity	STS13	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.819	# 17	Compare
Semantic Textual Similarity	STS14	Mirror-BERT-base (unsup.)	Spearman Correlation	0.713	# 18	Compare
Semantic Textual Similarity	STS14	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.732	# 17	Compare
Semantic Textual Similarity	STS15	Mirror-BERT-base (unsup.)	Spearman Correlation	0.814	# 17	Compare
Semantic Textual Similarity	STS15	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.798	# 19	Compare
Semantic Textual Similarity	STS16	Mirror-BERT-base (unsup.)	Spearman Correlation	0.743	# 19	Compare
Semantic Textual Similarity	STS16	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.78	# 15	Compare
Semantic Textual Similarity	STS Benchmark	Mirror-RoBERTa-base (unsup.)	Spearman Correlation	0.787	# 32	Compare
Semantic Textual Similarity	STS Benchmark	Mirror-BERT-base (unsup.)	Spearman Correlation	0.764	# 37	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Contrastive Learning • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Mirror-BERT • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove