103 papers with code • 1 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?


Use these libraries to find STS models and implementations

Most implemented papers

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

UKPLab/sentence-transformers IJCNLP 2019

However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

princeton-nlp/SimCSE EMNLP 2021

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.

MedSTS: A Resource for Clinical Semantic Textual Similarity

ncbi-nlp/BioSentVec 28 Aug 2018

A subset of MedSTS (MedSTS_ann) containing 1, 068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity).

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

txsun1997/metric-fairness 31 Jul 2017

Semantic Textual Similarity (STS) measures the meaning similarity of sentences.

KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding

kakaobrain/KorNLUDatasets Findings of the Association for Computational Linguistics 2020

Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language.

Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors

Babylonpartners/fuzzymax ICLR 2019

Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks.

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

fajri91/ffci 27 Nov 2020

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).

PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT

AI-Growth-Lab/Patent-Classification 22 Mar 2021

This study provides an efficient approach for using text data to calculate patent-to-patent (p2p) technological similarity, and presents a hybrid framework for leveraging the resulting p2p similarity for applications such as semantic search and automated patent classification.

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

google-research/t5x_retrieval Findings (ACL) 2022

To support our investigation, we establish a new sentence representation transfer benchmark, SentGLUE, which extends the SentEval toolkit to nine tasks from the GLUE benchmark.