Natural Language Inference

729 papers with code • 43 benchmarks • 77 datasets

Natural language inference (NLI) is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise".

Example:

Premise	Label	Hypothesis
A man inspects the uniform of a figure in some East Asian country.	contradiction	The man is sleeping.
An older and younger man smiling.	neutral	Two men are smiling and laughing at the cats playing on the floor.
A soccer game with multiple males playing.	entailment	Some men are playing a sport.

Approaches used for NLI include earlier symbolic and statistical approaches to more recent deep learning approaches. Benchmark datasets used for NLI include SNLI, MultiNLI, SciTail, among others. You can get hands-on practice on the SNLI task by following this d2l.ai chapter.

Benchmarks

Add a Result

These leaderboards are used to track progress in Natural Language Inference

Dataset	Best Model	Compare
SNLI	RoBERTa-large 355M + Entailment as Few-shot Learner	See all
RTE	Vega v2 6B (KD-based prompt transfer)	See all
MultiNLI	Turing NLR v5 XXL 5.4B (fine-tuned)	See all
QNLI	ALBERT	See all
ANLI test	T5-3B (explanation prompting)	See all
WNLI	Turing NLR v5 XXL 5.4B (fine-tuned)	See all
CommitmentBank	PaLM 540B (finetuned)	See all
SciTail	CA-MTL	See all
MultiNLI Dev	TinyBERT-6 67M	See all
FarsTail	mBERT	See all
MedNLI	SciFive-large	See all
TERRa	Human Benchmark	See all
LiDiRus	Human Benchmark	See all
RCB	Human Benchmark	See all
XNLI French	FlauBERT (large)	See all
V-SNLI	V-BiMPM	See all
XNLI Chinese Dev	ERNIE 2.0 Base	See all
XNLI Chinese	ERNIE 2.0 Large	See all
Quora Question Pairs	aESIM	See all
SICK	NeuralLog	See all
MED	NeuralLog	See all
KUAKE-QQR	BERT-base	See all
KUAKE-QTR	MacBERT-large	See all
XWINO	mGPT	See all
MRPC	DeBERTaV3large	See all
HANS	Roberta-large	See all
BioNLI	BioLinkBert	See all
AX	T5	See all
MNLI + SNLI + ANLI + FEVER	SMARTRoBERTa-LARGE	See all
e-SNLI	ExplainThenPredictAttention (e-InferSent Bi-LSTM + Attention)	See all
Probability words NLI	roberta-base-mnli	See all

Show all 34 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Natural Language Inference models and implementations

huggingface/transformers

14 papers

124,889

namisan/mt-dnn

5 papers

2,199

dmlc/gluon-nlp

4 papers

2,548

mynlp/ccg2lambda

4 papers

228

See all 17 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4

EdinburghClinicalNLP/semeval_nli4ct • • 30 Mar 2024

The NLI4CT task assesses Natural Language Inference systems in predicting whether hypotheses entail or contradict evidence from Clinical Trial Reports.

30 Mar 2024

Paper
Code

Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

batsresearch/bonito • 28 Feb 2024

Overall, we show that learning with synthetic instruction tuning datasets is an effective way to adapt language models to new domains.

470

28 Feb 2024

Paper
Code

On the use of Silver Standard Data for Zero-shot Classification Tasks in Information Extraction

wjw136/clean_lave • • 28 Feb 2024

Recent zero-shot classification methods converted the task to other NLP tasks (e. g., textual entailment) and used off-the-shelf models of these NLP tasks to directly perform inference on the test data without using a large amount of IE annotation data.

28 Feb 2024

Paper
Code

Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks

hjznlp/infuse • • 27 Feb 2024

We study existing approaches to leverage off-the-shelf Natural Language Inference (NLI) models for the evaluation of summary faithfulness and argue that these are sub-optimal due to the granularity level considered for premises and hypotheses.

27 Feb 2024

Paper
Code

GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?

yipingnus/gpt-hate-check • 23 Feb 2024

A recent proposal in this direction is HateCheck, a suite for testing fine-grained model functionalities on synthesized data generated using templates of the kind "You are just a [slur] to me."

23 Feb 2024

Paper
Code

Conformalized Credal Set Predictors

alireza-javanmardi/conformal-credal-sets • • 16 Feb 2024

Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution.

16 Feb 2024

Paper
Code

Pixel Sentence Representation Learning

gowitheflow-1998/pixel-linguist • • 13 Feb 2024

To our knowledge, this is the first representation learning method devoid of traditional language models for understanding sentence and document semantics, marking a stride closer to human-like textual comprehension.

13 Feb 2024

Paper
Code

Plausible Extractive Rationalization through Semi-Supervised Entailment Signal

wj210/NLI_ETP • • 13 Feb 2024

The increasing use of complex and opaque black box models requires the adoption of interpretable measures, one such option is extractive rationalizing models, which serve as a more interpretable alternative.

13 Feb 2024

Paper
Code

A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models

marbr987/hypothesis_driven_analysis_of_self_rationalising_models • 7 Feb 2024

The self-rationalising capabilities of LLMs are appealing because the generated explanations can give insights into the plausibility of the predictions.

07 Feb 2024

Paper
Code

HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text

hqa-attack/hqaattack-demo • • NeurIPS 2023

Black-box hard-label adversarial attack on text is a practical and challenging task, as the text data space is inherently discrete and non-differentiable, and only the predicted label is accessible.

02 Feb 2024

Paper
Code

Natural Language Inference

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result