Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT... (read more)

PDF Abstract IJCNLP 2019 PDF IJCNLP 2019 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Semantic Textual Similarity SICK SBERT-NLI-base Spearman Correlation 0.7291 # 6
Semantic Textual Similarity SICK SBERT-NLI-large Spearman Correlation 0.7375 # 5
Semantic Textual Similarity SICK SRoBERTa-NLI-large Spearman Correlation 0.7429 # 3
Semantic Textual Similarity SICK SRoBERTa-NLI-base Spearman Correlation 0.7446 # 2
Semantic Textual Similarity STS12 SRoBERTa-NLI-large Spearman Correlation 0.7453 # 2
Semantic Textual Similarity STS13 SBERT-NLI-large Spearman Correlation 0.7846 # 3
Semantic Textual Similarity STS14 SBERT-NLI-large Spearman Correlation 0.749 # 2
Semantic Textual Similarity STS15 SRoBERTa-NLI-large Spearman Correlation 0.8185 # 2
Semantic Textual Similarity STS16 SRoBERTa-NLI-large Spearman Correlation 0.7682 # 4
Semantic Textual Similarity STS Benchmark SRoBERTa-NLI-STSb-large Spearman Correlation 0.8615 # 9
Semantic Textual Similarity STS Benchmark SBERT-STSb-large Spearman Correlation 0.8445 # 14
Semantic Textual Similarity STS Benchmark SBERT-NLI-large Spearman Correlation 0.7909999999999999 # 15
Semantic Textual Similarity STS Benchmark SBERT-NLI-base Spearman Correlation 0.7703 # 18
Semantic Textual Similarity STS Benchmark SRoBERTa-NLI-base Spearman Correlation 0.7777 # 17
Semantic Textual Similarity STS Benchmark SBERT-STSb-base Spearman Correlation 0.8535 # 11
Semantic Textual Similarity STS Benchmark SBERT-NLI-STSb-large Spearman Correlation 0.861 # 10
Semantic Textual Similarity STS Benchmark SRoBERTa-NLI-STSb-base Spearman Correlation 0.8479 # 13

Methods used in the Paper


METHOD TYPE
Residual Connection
Skip Connections
Attention Dropout
Regularization
Linear Warmup With Linear Decay
Learning Rate Schedules
Weight Decay
Regularization
RoBERTa
Transformers
GELU
Activation Functions
Dense Connections
Feedforward Networks
Adam
Stochastic Optimization
WordPiece
Subword Segmentation
Softmax
Output Functions
Dropout
Regularization
Multi-Head Attention
Attention Modules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
BERT
Language Models