TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Semantic Textual Similarity within Bi-Encoder	MRPC	AugSBERT-BM25	F1	85.46%	# 1
Paraphrase Identification within Bi-Encoder	Quora Question Pairs	AugSBERT-KDE	Spearmanr	79.31%	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/augmented-sbert-data-augmentation-method-for/semantic-textual-similarity-within-bi-encoder)](https://paperswithcode.com/sota/semantic-textual-similarity-within-bi-encoder?p=augmented-sbert-data-augmentation-method-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/augmented-sbert-data-augmentation-method-for/paraphrase-identification-within-bi-encoder)](https://paperswithcode.com/sota/paraphrase-identification-within-bi-encoder?p=augmented-sbert-data-augmentation-method-for)`

Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

NAACL 2021 · Nandan Thakur, Nils Reimers, Johannes Daxenberger, Iryna Gurevych ·

There are two approaches for pairwise sentence scoring: Cross-encoders, which perform full-attention over the input pair, and Bi-encoders, which map each input independently to a dense vector space. While cross-encoders often achieve higher performance, they are too slow for many practical use cases. Bi-encoders, on the other hand, require substantial training data and fine-tuning over the target task to achieve competitive performance. We present a simple yet efficient data augmentation strategy called Augmented SBERT, where we use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder. We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method. We evaluate our approach on multiple tasks (in-domain) as well as on a domain adaptation task. Augmented SBERT achieves an improvement of up to 6 points for in-domain and of up to 37 points for domain adaptation tasks compared to the original bi-encoder performance.

PDF Abstract NAACL 2021 PDF NAACL 2021 Abstract

Code

Add Remove Mark official

UKPLab/sentence-transformers official

13,762

Tasks

Add Remove

Data Augmentation

Domain Adaptation

Paraphrase Identification within Bi-Encoder

Semantic Textual Similarity

Semantic Textual Similarity within Bi-Encoder

Sentence

Sentence Pair Modeling

Datasets

GLUE

SNLI

MRPC

Quora Question Pairs

Results from the Paper

Edit

Ranked #1 on Semantic Textual Similarity within Bi-Encoder on MRPC

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Semantic Textual Similarity within Bi-Encoder	MRPC	AugSBERT-BM25	F1	85.46%	# 1		Compare
Paraphrase Identification within Bi-Encoder	Quora Question Pairs	AugSBERT-KDE	Spearmanr	79.31%	# 1		Compare

Methods

Add Remove

Adam • Augmented SBERT • BERT • Dropout • SBERT • Siamese Network • Softmax

Edit Social Preview

Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove