Visual Reasoning

85 papers with code • 7 benchmarks • 25 datasets

Ability to understand actions and reasoning associated with any visual images


Use these libraries to find Visual Reasoning models and implementations
3 papers

Most implemented papers

Compositional Attention Networks for Machine Reasoning

stanfordnlp/mac-network ICLR 2018

We present the MAC network, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning.

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

airsplay/lxmert IJCNLP 2019

In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder.

Inferring and Executing Programs for Visual Reasoning

facebookresearch/clevr-iep ICCV 2017

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes.

Learning to Compose Dynamic Tree Structures for Visual Contexts

KaihuaTang/Scene-Graph-Benchmark.pytorch CVPR 2019

We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A.

VisualBERT: A Simple and Performant Baseline for Vision and Language

uclanlp/visualbert 9 Aug 2019

We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks.

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

ethanjperez/film CVPR 2017

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings.

FiLM: Visual Reasoning with a General Conditioning Layer

ethanjperez/film 22 Sep 2017

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation.

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

stanfordnlp/mac-network CVPR 2019

We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets.

CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions

ruotianluo/iep-ref CVPR 2019

Yet there has been evidence that current benchmark datasets suffer from bias, and current state-of-the-art models cannot be easily evaluated on their intermediate reasoning process.

Learning by Abstraction: The Neural State Machine

stanfordnlp/mac-network NeurIPS 2019

We introduce the Neural State Machine, seeking to bridge the gap between the neural and symbolic views of AI and integrate their complementary strengths for the task of visual reasoning.