Natural Language Inference

730 papers with code • 43 benchmarks • 77 datasets

Natural language inference (NLI) is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise".

Example:

Premise	Label	Hypothesis
A man inspects the uniform of a figure in some East Asian country.	contradiction	The man is sleeping.
An older and younger man smiling.	neutral	Two men are smiling and laughing at the cats playing on the floor.
A soccer game with multiple males playing.	entailment	Some men are playing a sport.

Approaches used for NLI include earlier symbolic and statistical approaches to more recent deep learning approaches. Benchmark datasets used for NLI include SNLI, MultiNLI, SciTail, among others. You can get hands-on practice on the SNLI task by following this d2l.ai chapter.

Benchmarks

Add a Result

These leaderboards are used to track progress in Natural Language Inference

Dataset	Best Model	Compare
SNLI	RoBERTa-large 355M + Entailment as Few-shot Learner	See all
RTE	Vega v2 6B (KD-based prompt transfer)	See all
MultiNLI	Turing NLR v5 XXL 5.4B (fine-tuned)	See all
QNLI	ALBERT	See all
ANLI test	T5-3B (explanation prompting)	See all
WNLI	Turing NLR v5 XXL 5.4B (fine-tuned)	See all
CommitmentBank	PaLM 540B (finetuned)	See all
SciTail	CA-MTL	See all
MultiNLI Dev	TinyBERT-6 67M	See all
FarsTail	mBERT	See all
MedNLI	SciFive-large	See all
TERRa	Human Benchmark	See all
LiDiRus	Human Benchmark	See all
RCB	Human Benchmark	See all
XNLI French	FlauBERT (large)	See all
V-SNLI	V-BiMPM	See all
XNLI Chinese Dev	ERNIE 2.0 Base	See all
XNLI Chinese	ERNIE 2.0 Large	See all
Quora Question Pairs	aESIM	See all
SICK	NeuralLog	See all
MED	NeuralLog	See all
KUAKE-QQR	BERT-base	See all
KUAKE-QTR	MacBERT-large	See all
XWINO	mGPT	See all
MRPC	DeBERTaV3large	See all
HANS	Roberta-large	See all
BioNLI	BioLinkBert	See all
AX	T5	See all
MNLI + SNLI + ANLI + FEVER	SMARTRoBERTa-LARGE	See all
e-SNLI	ExplainThenPredictAttention (e-InferSent Bi-LSTM + Attention)	See all
Probability words NLI	roberta-base-mnli	See all

Show all 34 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Natural Language Inference models and implementations

huggingface/transformers

14 papers

124,984

namisan/mt-dnn

5 papers

2,199

dmlc/gluon-nlp

4 papers

2,548

mynlp/ccg2lambda

4 papers

229

See all 17 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

CASPR: Automated Evaluation Metric for Contrastive Summarization

niru-umass-dev/caspr • 23 Apr 2024

Summarizing comparative opinions about entities (e. g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making.

23 Apr 2024

Paper
Code

TLDR at SemEval-2024 Task 2: T5-generated clinical-Language summaries for DeBERTa Report Analysis

shahriarnz14/tldr-t5-generated-clinical-language-for-deberta-report-analysis • • 14 Apr 2024

This paper introduces novel methodologies for the Natural Language Inference for Clinical Trials (NLI4CT) task.

14 Apr 2024

Paper
Code

XNLIeu: a dataset for cross-lingual NLI in Basque

faceonlive/ai-research • 10 Apr 2024

We have conducted a series of experiments using mono- and multilingual LLMs to assess a) the effect of professional post-edition on the MT system; b) the best cross-lingual strategy for NLI in Basque; and c) whether the choice of the best cross-lingual strategy is influenced by the fact that the dataset is built by translation.

152

10 Apr 2024

Paper
Code

IITK at SemEval-2024 Task 2: Exploring the Capabilities of LLMs for Safe Biomedical Natural Language Inference for Clinical Trials

exploration-lab/iitk-semeval-2024-task-2-clinical-nli • 6 Apr 2024

Large Language models (LLMs) have demonstrated state-of-the-art performance in various natural language processing (NLP) tasks across multiple domains, yet they are prone to shortcut learning and factual inconsistencies.

06 Apr 2024

Paper
Code

Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish

faceonlive/ai-research • 5 Apr 2024

A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels.

152

05 Apr 2024

Paper
Code

Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study

faceonlive/ai-research • 5 Apr 2024

In this paper, we investigate the robustness of operationalization choices for few-shot stance detection, with special attention to modelling stance across different topics.

152

05 Apr 2024

Paper
Code

Evaluating Generative Language Models in Information Extraction as Subjective Question Correction

thu-keg/sqc-score • • 4 Apr 2024

(1) The imprecision of existing evaluation metrics that struggle to effectively gauge semantic consistency between model outputs and ground truth, and (2) The inherent incompleteness of evaluation benchmarks, primarily due to restrictive human annotation schemas, resulting in underestimated LLM performances.

04 Apr 2024

Paper
Code

Affective-NLI: Towards Accurate and Interpretable Personality Recognition in Conversation

preke/affective-nli • • 3 Apr 2024

To utilize affectivity within dialog content for accurate personality recognition, we fine-tuned a pre-trained language model specifically for emotion recognition in conversations, facilitating real-time affective annotations for utterances.

03 Apr 2024

Paper
Code

On the Role of Summary Content Units in Text Summarization Evaluation

tristanratz/scu-text-evaluation • 2 Apr 2024

At the heart of the Pyramid evaluation method for text summarization lie human written summary content units (SCUs).

02 Apr 2024

Paper
Code

AILS-NTUA at SemEval-2024 Task 6: Efficient model tuning for hallucination detection and analysis

ngregoriade/semeval2024-shroom • • 1 Apr 2024

In this paper, we present our team's submissions for SemEval-2024 Task-6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes.

01 Apr 2024

Paper
Code

Natural Language Inference

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result