Semantic textual similarity deals with determining how similar two pieces of texts are. This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification.
ICLR 2018
• facebookresearch/InferSent
•
A lot of the recent success in natural language processing (NLP) has been driven by distributed vector representations of words trained on large amounts of text in an unsupervised manner. In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model.
#3 best model for
Natural Language Inference on MultiNLI
MULTI-TASK LEARNING NATURAL LANGUAGE INFERENCE PARAPHRASE IDENTIFICATION SEMANTIC TEXTUAL SIMILARITY
For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. We find that transfer learning using sentence embeddings tends to outperform word level transfer.
SOTA for Text Classification on TREC-6
SEMANTIC TEXTUAL SIMILARITY SENTENCE EMBEDDINGS SENTIMENT ANALYSIS SUBJECTIVITY ANALYSIS TEXT CLASSIFICATION TRANSFER LEARNING WORD EMBEDDINGS
EMNLP 2017
• facebookresearch/InferSent
•
Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful.
#2 best model for
Semantic Textual Similarity on SentEval
CROSS-LINGUAL NATURAL LANGUAGE INFERENCE SEMANTIC TEXTUAL SIMILARITY TRANSFER LEARNING WORD EMBEDDINGS
Preprint 2018 • openai/finetune-transformer-lm
We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding.
#3 best model for
Natural Language Inference on SNLI
DOCUMENT CLASSIFICATION LANGUAGE MODELLING NATURAL LANGUAGE INFERENCE QUESTION ANSWERING SEMANTIC TEXTUAL SIMILARITY
COLING 2018
• lanwuwei/SPM_toolkit
•
In this paper, we analyze several neural network designs (and their variations) for sentence pair modeling and compare their performance extensively across eight datasets, including paraphrase identification, semantic textual similarity, natural language inference, and question answering tasks. Although most of these models have claimed state-of-the-art performance, the original papers often reported on only one or two selected datasets.
NATURAL LANGUAGE INFERENCE PARAPHRASE IDENTIFICATION QUESTION ANSWERING SEMANTIC TEXTUAL SIMILARITY SENTENCE PAIR MODELING
HLT 2018
• lanwuwei/SPM_toolkit
•
Sentence pair modeling is critical for many NLP tasks, such as paraphrase identification, semantic textual similarity, and natural language inference. Most state-of-the-art neural models for these tasks rely on pretrained word embedding and compose sentence-level semantics in varied ways; however, few works have attempted to verify whether we really need pretrained embeddings in these tasks.
NATURAL LANGUAGE INFERENCE PARAPHRASE IDENTIFICATION SEMANTIC TEXTUAL SIMILARITY SENTENCE PAIR MODELING
Determining semantic textual similarity is a core research subject in natural language processing. Since vector-based models for sentence representation often use shallow information, capturing accurate semantics is difficult.
HLT 2016 • nmrksic/counter-fitting
In this work, we present a novel counter-fitting method which injects antonymy and synonymy constraints into vector space representations in order to improve the vectors' capability for judging semantic similarity. Applying this method to publicly available pre-trained word vectors leads to a new state of the art performance on the SimLex-999 dataset.
WS 2017 • nathanshartmann/portuguese_word_embeddings
Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing sys- tems. In this paper, we evaluated different word embedding models trained on a large Portuguese corpus, including both Brazilian and European variants.
Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions. However, the type labels so obtained from knowledge bases are often noisy (i.e., incorrect for the entity mention's local context).