About

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Subtasks

Datasets

Greatest papers with code

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

IJCNLP 2019 huggingface/transformers

In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder.

LANGUAGE MODELLING QUESTION ANSWERING VISUAL QUESTION ANSWERING VISUAL REASONING

Inferring and Executing Programs for Visual Reasoning

ICCV 2017 facebookresearch/clevr-iep

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes.

VISUAL QUESTION ANSWERING VISUAL REASONING

Learning to Compose Dynamic Tree Structures for Visual Contexts

CVPR 2019 KaihuaTang/Scene-Graph-Benchmark.pytorch

We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A.

GRAPH GENERATION SCENE GRAPH GENERATION VISUAL QUESTION ANSWERING VISUAL REASONING

Learning by Abstraction: The Neural State Machine

NeurIPS 2019 stanfordnlp/mac-network

We introduce the Neural State Machine, seeking to bridge the gap between the neural and symbolic views of AI and integrate their complementary strengths for the task of visual reasoning.

VISUAL QUESTION ANSWERING VISUAL REASONING

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

CVPR 2019 stanfordnlp/mac-network

We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets.

QUESTION ANSWERING VISUAL QUESTION ANSWERING VISUAL REASONING

Compositional Attention Networks for Machine Reasoning

ICLR 2018 stanfordnlp/mac-network

We present the MAC network, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning.

VISUAL QUESTION ANSWERING VISUAL REASONING

PHYRE: A New Benchmark for Physical Reasoning

NeurIPS 2019 facebookresearch/phyre

The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.

VISUAL REASONING

VisualBERT: A Simple and Performant Baseline for Vision and Language

9 Aug 2019uclanlp/visualbert

We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks.

LANGUAGE MODELLING VISUAL QUESTION ANSWERING VISUAL REASONING

A Corpus for Reasoning About Natural Language Grounded in Photographs

ACL 2019 lil-lab/nlvr

We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language.

VISUAL REASONING