STS
127 papers with code • 1 benchmarks • 5 datasets
Benchmarks
These leaderboards are used to track progress in STS
Trend | Dataset | Best Model | Paper | Code | Compare |
---|
Libraries
Use these libraries to find STS models and implementationsMost implemented papers
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.
SimCSE: Simple Contrastive Learning of Sentence Embeddings
This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Learning sentence embeddings often requires a large amount of labeled data.
MedSTS: A Resource for Clinical Semantic Textual Similarity
A subset of MedSTS (MedSTS_ann) containing 1, 068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity).
MTEB: Massive Text Embedding Benchmark
MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages.
Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss
Since the introduction of BERT and RoBERTa, research on Semantic Textual Similarity (STS) has made groundbreaking progress.
Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic Textual Similarity
Semantic Textual Similarity (STS) constitutes a critical research direction in computational linguistics and serves as a key indicator of the encoding capabilities of embedding models.
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language.
Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors
Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks.