69 papers with code • 1 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?


Use these libraries to find STS models and implementations

Most implemented papers

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

UKPLab/sentence-transformers IJCNLP 2019

However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

princeton-nlp/SimCSE EMNLP 2021

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

txsun1997/metric-fairness 31 Jul 2017

Semantic Textual Similarity (STS) measures the meaning similarity of sentences.

MedSTS: A Resource for Clinical Semantic Textual Similarity

ncbi-nlp/BioSentVec 28 Aug 2018

A subset of MedSTS (MedSTS_ann) containing 1, 068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity).

KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding

kakaobrain/KorNLUDatasets Findings of the Association for Computational Linguistics 2020

Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language.

Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors

Babylonpartners/fuzzymax ICLR 2019

Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks.

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

fajri91/ffci 27 Nov 2020

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

google-research/t5x_retrieval Findings (ACL) 2022

To support our investigation, we establish a new sentence representation transfer benchmark, SentGLUE, which extends the SentEval toolkit to nine tasks from the GLUE benchmark.

ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

caskcsg/sentemb COLING 2022

Unsup-SimCSE takes dropout as a minimal data augmentation method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embeddings to build a positive pair.