TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Sentence Embeddings For Biomedical Texts	BIOSSES	Q-gram (q = 3)	Pearson Correlation	0.723	# 10
Sentence Embeddings For Biomedical Texts	BIOSSES	Skip-thoughts	Pearson Correlation	0.485	# 11
Sentence Embeddings For Biomedical Texts	BIOSSES	Supervised combination of: Jaccard, Q-gram, sent2vec, Paragraph vector DM, skip-thoughts, fastText	Pearson Correlation	0.871	# 1
Sentence Embeddings For Biomedical Texts	BIOSSES	Unsupervised combination (mean) of: Jaccard, q-gram, Paragraph vector (PV-DBOW) and sent2vec	Pearson Correlation	0.846	# 2
Sentence Embeddings For Biomedical Texts	BIOSSES	Paragraph vector (PV-DBOW)	Pearson Correlation	0.804	# 5
Sentence Embeddings For Biomedical Texts	BIOSSES	Paragraph vector (PV-DM)	Pearson Correlation	0.819	# 3
Sentence Embeddings For Biomedical Texts	BIOSSES	Sent2vec	Pearson Correlation	0.798	# 6
Sentence Embeddings For Biomedical Texts	BIOSSES	fastText (CBOW, max pooling)	Pearson Correlation	0.253	# 14
Sentence Embeddings For Biomedical Texts	BIOSSES	fastText (skip-gram, max pooling)	Pearson Correlation	0.766	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/neural-sentence-embedding-models-for-semantic/sentence-embeddings-for-biomedical-texts-on)](https://paperswithcode.com/sota/sentence-embeddings-for-biomedical-texts-on?p=neural-sentence-embedding-models-for-semantic)`

Neural sentence embedding models for semantic similarity estimation in the biomedical domain

1 Oct 2021 · Kathrin Blagec, Hong Xu, Asan Agibetov, Matthias Samwald ·

BACKGROUND: In this study, we investigated the efficacy of current state-of-the-art neural sentence embedding models for semantic similarity estimation of sentences from biomedical literature. We trained different neural embedding models on 1.7 million articles from the PubMed Open Access dataset, and evaluated them based on a biomedical benchmark set containing 100 sentence pairs annotated by human experts and a smaller contradiction subset derived from the original benchmark set. RESULTS: With a Pearson correlation of 0.819, our best unsupervised model based on the Paragraph Vector Distributed Memory algorithm outperforms previous state-of-the-art results achieved on the BIOSSES biomedical benchmark set. Moreover, our proposed supervised model that combines different string-based similarity metrics with a neural embedding model surpasses previous ontology-dependent supervised state-of-the-art approaches in terms of Pearson's r (r=0.871) on the biomedical benchmark set. In contrast to the promising results for the original benchmark, we found our best models' performance on the smaller contradiction subset to be poor. CONCLUSIONS: In this study we highlighted the value of neural network-based models for semantic similarity estimation in the biomedical domain by showing that they can keep up with and even surpass previous state-of-the-art approaches for semantic similarity estimation that depend on the availability of laboriously curated ontologies when evaluated on a biomedical benchmark set. Capturing contradictions and negations in biomedical sentences, however, emerged as an essential area for further work.

PDF Abstract

Code

Add Remove Mark official

kathrinblagec/neural-sentence-embed… official

Tasks

Add Remove

Semantic Similarity

Semantic Textual Similarity

Sentence

Sentence Embedding

Sentence-Embedding

Sentence Embeddings For Biomedical Texts

Datasets

BIOSSES

Results from the Paper

Add Remove

Ranked #1 on Sentence Embeddings For Biomedical Texts on BIOSSES

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Sentence Embeddings For Biomedical Texts	BIOSSES	Q-gram (q = 3)	Pearson Correlation	0.723	# 10	Compare
Sentence Embeddings For Biomedical Texts	BIOSSES	Skip-thoughts	Pearson Correlation	0.485	# 11	Compare
Sentence Embeddings For Biomedical Texts	BIOSSES	Supervised combination of: Jaccard, Q-gram, sent2vec, Paragraph vector DM, skip-thoughts, fastText	Pearson Correlation	0.871	# 1	Compare
Sentence Embeddings For Biomedical Texts	BIOSSES	Unsupervised combination (mean) of: Jaccard, q-gram, Paragraph vector (PV-DBOW) and sent2vec	Pearson Correlation	0.846	# 2	Compare
Sentence Embeddings For Biomedical Texts	BIOSSES	Paragraph vector (PV-DBOW)	Pearson Correlation	0.804	# 5	Compare
Sentence Embeddings For Biomedical Texts	BIOSSES	Paragraph vector (PV-DM)	Pearson Correlation	0.819	# 3	Compare
Sentence Embeddings For Biomedical Texts	BIOSSES	Sent2vec	Pearson Correlation	0.798	# 6	Compare
Sentence Embeddings For Biomedical Texts	BIOSSES	fastText (CBOW, max pooling)	Pearson Correlation	0.253	# 14	Compare
Sentence Embeddings For Biomedical Texts	BIOSSES	fastText (skip-gram, max pooling)	Pearson Correlation	0.766	# 9	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Neural sentence embedding models for semantic similarity estimation in the biomedical domain

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove