Visual Entailment Task for Visually-Grounded Language Learning

We introduce a new inference task - Visual Entailment (VE) - which differs from traditional Textual Entailment (TE) tasks whereby a premise is defined by an image, rather than a natural language sentence as in TE tasks. A novel dataset SNLI-VE (publicly available at is proposed for VE tasks based on the Stanford Natural Language Inference corpus and Flickr30k... (read more)

Results in Papers With Code
(↓ scroll down to see all results)