A large annotated corpus for learning natural language inference

Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

PDF Abstract EMNLP 2015 PDF EMNLP 2015 Abstract

Datasets


Introduced in the Paper:

SNLI

Used in the Paper:

Flickr30k SICK

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Natural Language Inference SNLI + Unigram and bigram features % Test Accuracy 78.2 # 89
% Train Accuracy 99.7 # 1
Parameters # 3
Natural Language Inference SNLI Unlexicalized features % Test Accuracy 50.4 # 91
% Train Accuracy 49.4 # 72
Parameters # 3
Natural Language Inference SNLI 100D LSTM encoders % Test Accuracy 77.6 # 90
% Train Accuracy 84.8 # 67
Parameters 220k # 3

Methods


No methods listed for this paper. Add relevant methods here