TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Visual Entailment	SNLI-VE test	EVE-ROI*	Accuracy	70.47	# 8
Visual Entailment	SNLI-VE val	EVE-ROI*	Accuracy	70.81	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-entailment-a-novel-task-for-fine/visual-entailment-on-snli-ve-test)](https://paperswithcode.com/sota/visual-entailment-on-snli-ve-test?p=visual-entailment-a-novel-task-for-fine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-entailment-a-novel-task-for-fine/visual-entailment-on-snli-ve-val)](https://paperswithcode.com/sota/visual-entailment-on-snli-ve-val?p=visual-entailment-a-novel-task-for-fine)`

Visual Entailment: A Novel Task for Fine-Grained Image Understanding

20 Jan 2019 · Ning Xie, Farley Lai, Derek Doran, Asim Kadav ·

Existing visual reasoning datasets such as Visual Question Answering (VQA), often suffer from biases conditioned on the question, image or answer distributions. The recently proposed CLEVR dataset addresses these limitations and requires fine-grained reasoning but the dataset is synthetic and consists of similar objects and sentence structures across the dataset. In this paper, we introduce a new inference task, Visual Entailment (VE) - consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal of a trained VE model is to predict whether the image semantically entails the text. To realize this task, we build a dataset SNLI-VE based on the Stanford Natural Language Inference corpus and Flickr30k dataset. We evaluate various existing VQA baselines and build a model called Explainable Visual Entailment (EVE) system to address the VE task. EVE achieves up to 71% accuracy and outperforms several other state-of-the-art VQA based models. Finally, we demonstrate the explainability of EVE through cross-modal attention visualizations. The SNLI-VE dataset is publicly available at https://github.com/ necla-ml/SNLI-VE.

PDF Abstract