Semantic Similarity
465 papers with code • 27 benchmarks • 14 datasets
The main objective Semantic Similarity is to measure the distance between the semantic meanings of a pair of words, phrases, sentences, or documents. For example, the word “car” is more similar to “bus” than it is to “cat”. The two main approaches to measuring Semantic Similarity are knowledge-based approaches and corpus-based, distributional methods.
Source: Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection
Libraries
Use these libraries to find Semantic Similarity models and implementationsDatasets
Most implemented papers
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.
ERNIE: Enhanced Representation through Knowledge Integration
We present a novel language representation model enhanced by knowledge called ERNIE (Enhanced Representation through kNowledge IntEgration).
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks.
Improving Language Understanding by Generative Pre-Training
We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task.
Language-agnostic BERT Sentence Embedding
While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.
MedSTS: A Resource for Clinical Semantic Textual Similarity
A subset of MedSTS (MedSTS_ann) containing 1, 068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0-5 (low to high similarity).
Calculating the similarity between words and sentences using a lexical database and corpus statistics
To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database.
Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding
Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions.
Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks
Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing sys- tems.