Visual Entailment

27 papers with code • 3 benchmarks • 3 datasets

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Entailment

Dataset	Best Model	Compare
SNLI-VE val	OFA	See all
SNLI-VE test	OFA	See all
e-SNLI-VE	OFA-X	See all

Libraries

Use these libraries to find Visual Entailment models and implementations

ofa-sys/ofa

2 papers

2,321

Datasets

Most implemented papers

Most implemented Social Latest No code

Visual Entailment: A Novel Task for Fine-Grained Image Understanding

necla-ml/SNLI-VE • 20 Jan 2019

We evaluate various existing VQA baselines and build a model called Explainable Visual Entailment (EVE) system to address the VE task.

Paper
Code

Check It Again: Progressive Visual Question Answering via Visual Entailment

PhoebusSi/SAR • • 8 Jun 2021

Besides, they only explore the interaction between image and question, ignoring the semantics of candidate answers.

Paper
Code

Check It Again:Progressive Visual Question Answering via Visual Entailment

PhoebusSi/SAR • • ACL 2021

Besides, they only explore the interaction between image and question, ignoring the semantics of candidate answers.

Paper
Code

NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks

fawazsammani/nlxgpt • • CVPR 2022

Current NLE models explain the decision-making process of a vision or vision-language model (a. k. a., task model), e. g., a VQA model, via a language model (a. k. a., explanation model), e. g., GPT.

Paper
Code

Fine-Grained Visual Entailment

skrighyz/fgve • • 29 Mar 2022

In this paper, we propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image.

Paper
Code