Natural Language Inference
730 papers with code • 43 benchmarks • 77 datasets
Natural language inference (NLI) is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise".
Example:
Premise | Label | Hypothesis |
---|---|---|
A man inspects the uniform of a figure in some East Asian country. | contradiction | The man is sleeping. |
An older and younger man smiling. | neutral | Two men are smiling and laughing at the cats playing on the floor. |
A soccer game with multiple males playing. | entailment | Some men are playing a sport. |
Approaches used for NLI include earlier symbolic and statistical approaches to more recent deep learning approaches. Benchmark datasets used for NLI include SNLI, MultiNLI, SciTail, among others. You can get hands-on practice on the SNLI task by following this d2l.ai chapter.
Further readings:
Libraries
Use these libraries to find Natural Language Inference models and implementationsLatest papers
CASPR: Automated Evaluation Metric for Contrastive Summarization
Summarizing comparative opinions about entities (e. g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making.
TLDR at SemEval-2024 Task 2: T5-generated clinical-Language summaries for DeBERTa Report Analysis
This paper introduces novel methodologies for the Natural Language Inference for Clinical Trials (NLI4CT) task.
XNLIeu: a dataset for cross-lingual NLI in Basque
We have conducted a series of experiments using mono- and multilingual LLMs to assess a) the effect of professional post-edition on the MT system; b) the best cross-lingual strategy for NLI in Basque; and c) whether the choice of the best cross-lingual strategy is influenced by the fact that the dataset is built by translation.
IITK at SemEval-2024 Task 2: Exploring the Capabilities of LLMs for Safe Biomedical Natural Language Inference for Clinical Trials
Large Language models (LLMs) have demonstrated state-of-the-art performance in various natural language processing (NLP) tasks across multiple domains, yet they are prone to shortcut learning and factual inconsistencies.
Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish
A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels.
Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study
In this paper, we investigate the robustness of operationalization choices for few-shot stance detection, with special attention to modelling stance across different topics.
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction
(1) The imprecision of existing evaluation metrics that struggle to effectively gauge semantic consistency between model outputs and ground truth, and (2) The inherent incompleteness of evaluation benchmarks, primarily due to restrictive human annotation schemas, resulting in underestimated LLM performances.
Affective-NLI: Towards Accurate and Interpretable Personality Recognition in Conversation
To utilize affectivity within dialog content for accurate personality recognition, we fine-tuned a pre-trained language model specifically for emotion recognition in conversations, facilitating real-time affective annotations for utterances.
On the Role of Summary Content Units in Text Summarization Evaluation
At the heart of the Pyramid evaluation method for text summarization lie human written summary content units (SCUs).
AILS-NTUA at SemEval-2024 Task 6: Efficient model tuning for hallucination detection and analysis
In this paper, we present our team's submissions for SemEval-2024 Task-6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes.