STS

103 papers with code • 1 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in STS

Trend	Dataset	Best Model	Paper	Code	Compare

Libraries

Use these libraries to find STS models and implementations

UKPLab/sentence-transformers

2 papers

13,762

princeton-nlp/SimCSE

2 papers

3,243

climsocana/tecb-de

2 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

UKPLab/sentence-transformers • • IJCNLP 2019

However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.

Paper
Code

SimCSE: Simple Contrastive Learning of Sentence Embeddings

princeton-nlp/SimCSE • • EMNLP 2021

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.

Paper
Code

TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

UKPLab/sentence-transformers • • 14 Apr 2021

Learning sentence embeddings often requires a large amount of labeled data.

Paper
Code

MedSTS: A Resource for Clinical Semantic Textual Similarity

ncbi-nlp/BioSentVec • 28 Aug 2018

A subset of MedSTS (MedSTS_ann) containing 1, 068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity).

Paper
Code

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

txsun1997/metric-fairness • • 31 Jul 2017

Semantic Textual Similarity (STS) measures the meaning similarity of sentences.

Paper
Code

KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding

kakaobrain/KorNLUDatasets • Findings of the Association for Computational Linguistics 2020

Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language.

Paper
Code

Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors

Babylonpartners/fuzzymax • ICLR 2019

Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks.

Paper
Code

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

fajri91/ffci • 27 Nov 2020

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).

Paper
Code

PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT

AI-Growth-Lab/Patent-Classification • • 22 Mar 2021

This study provides an efficient approach for using text data to calculate patent-to-patent (p2p) technological similarity, and presents a hybrid framework for leveraging the resulting p2p similarity for applications such as semantic search and automated patent classification.

Paper
Code

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

google-research/t5x_retrieval • • Findings (ACL) 2022

To support our investigation, we establish a new sentence representation transfer benchmark, SentGLUE, which extends the SentEval toolkit to nine tasks from the GLUE benchmark.

Paper
Code

STS

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result