STS

127 papers with code • 1 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find STS models and implementations

Most implemented papers

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

UKPLab/sentence-transformers IJCNLP 2019

However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

princeton-nlp/SimCSE EMNLP 2021

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.

MedSTS: A Resource for Clinical Semantic Textual Similarity

ncbi-nlp/BioSentVec 28 Aug 2018

A subset of MedSTS (MedSTS_ann) containing 1, 068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity).

MTEB: Massive Text Embedding Benchmark

embeddings-benchmark/mteb 13 Oct 2022

MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages.

Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss

ZBWpro/STS-Regression 8 Jun 2024

Since the introduction of BERT and RoBERTa, research on Semantic Textual Similarity (STS) has made groundbreaking progress.

Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic Textual Similarity

ZBWpro/Pcc-tuning 14 Jun 2024

Semantic Textual Similarity (STS) constitutes a critical research direction in computational linguistics and serves as a key indicator of the encoding capabilities of embedding models.

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

txsun1997/metric-fairness 31 Jul 2017

Semantic Textual Similarity (STS) measures the meaning similarity of sentences.

KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding

kakaobrain/KorNLUDatasets Findings of the Association for Computational Linguistics 2020

Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language.

Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors

Babylonpartners/fuzzymax ICLR 2019

Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks.