Natural Language Inference
729 papers with code • 43 benchmarks • 77 datasets
Natural language inference (NLI) is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise".
Example:
Premise | Label | Hypothesis |
---|---|---|
A man inspects the uniform of a figure in some East Asian country. | contradiction | The man is sleeping. |
An older and younger man smiling. | neutral | Two men are smiling and laughing at the cats playing on the floor. |
A soccer game with multiple males playing. | entailment | Some men are playing a sport. |
Approaches used for NLI include earlier symbolic and statistical approaches to more recent deep learning approaches. Benchmark datasets used for NLI include SNLI, MultiNLI, SciTail, among others. You can get hands-on practice on the SNLI task by following this d2l.ai chapter.
Further readings:
Libraries
Use these libraries to find Natural Language Inference models and implementationsLatest papers
Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4
The NLI4CT task assesses Natural Language Inference systems in predicting whether hypotheses entail or contradict evidence from Clinical Trial Reports.
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
Overall, we show that learning with synthetic instruction tuning datasets is an effective way to adapt language models to new domains.
On the use of Silver Standard Data for Zero-shot Classification Tasks in Information Extraction
Recent zero-shot classification methods converted the task to other NLP tasks (e. g., textual entailment) and used off-the-shelf models of these NLP tasks to directly perform inference on the test data without using a large amount of IE annotation data.
Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks
We study existing approaches to leverage off-the-shelf Natural Language Inference (NLI) models for the evaluation of summary faithfulness and argue that these are sub-optimal due to the granularity level considered for premises and hypotheses.
GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?
A recent proposal in this direction is HateCheck, a suite for testing fine-grained model functionalities on synthesized data generated using templates of the kind "You are just a [slur] to me."
Conformalized Credal Set Predictors
Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution.
Pixel Sentence Representation Learning
To our knowledge, this is the first representation learning method devoid of traditional language models for understanding sentence and document semantics, marking a stride closer to human-like textual comprehension.
Plausible Extractive Rationalization through Semi-Supervised Entailment Signal
The increasing use of complex and opaque black box models requires the adoption of interpretable measures, one such option is extractive rationalizing models, which serve as a more interpretable alternative.
A Hypothesis-Driven Framework for the Analysis of Self-Rationalising Models
The self-rationalising capabilities of LLMs are appealing because the generated explanations can give insights into the plausibility of the predictions.
HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text
Black-box hard-label adversarial attack on text is a practical and challenging task, as the text data space is inherently discrete and non-differentiable, and only the predicted label is accessible.