Browse > Reasoning > Visual Reasoning

Visual Reasoning

12 papers with code · Reasoning

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

Compositional Attention Networks for Machine Reasoning

ICLR 2018 stanfordnlp/mac-network

We present the MAC network, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning. MAC moves away from monolithic black-box neural architectures towards a design that encourages both transparency and versatility.

VISUAL REASONING

FiLM: Visual Reasoning with a General Conditioning Layer

22 Sep 2017ethanjperez/film

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information.

VISUAL REASONING

Inferring and Executing Programs for Visual Reasoning

ICCV 2017 ethanjperez/film

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning.

VISUAL REASONING

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

CVPR 2017 ethanjperez/film

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. We present a diagnostic dataset that tests a range of visual reasoning abilities.

QUESTION ANSWERING VISUAL QUESTION ANSWERING VISUAL REASONING

Object Level Visual Reasoning in Videos

ECCV 2018 fabienbaradel/object_level_visual_reasoning

Human activity recognition is typically addressed by detecting key concepts like global and local motion, features related to object classes present in the scene, as well as features related to the global context. The next open challenges in activity recognition require a level of understanding that pushes beyond this and call for models with capabilities for fine distinction and detailed comprehension of interactions between actors and objects in a scene.

HUMAN ACTIVITY RECOGNITION OBJECT DETECTION VISUAL REASONING

A Dataset and Architecture for Visual Reasoning with a Working Memory

ECCV 2018 google/cog

COG is much simpler than the general problem of video analysis, yet it addresses many of the problems relating to visual and logical reasoning and memory -- problems that remain challenging for modern deep learning architectures. We additionally propose a deep learning architecture that performs competitively on other diagnostic VQA datasets (i.e. CLEVR) as well as easy settings of the COG dataset.

VISUAL QUESTION ANSWERING VISUAL REASONING

Mapping Natural Language Commands to Web Elements

EMNLP 2018 stanfordnlp/phrasenode

The web provides a rich, open-domain environment with textual, structural, and spatial properties. We propose a new task for grounding language in this environment: given a natural language command (e.g., "click on the second article"), choose the correct element on the web page (e.g., a hyperlink or text box).

RELATIONAL REASONING VISUAL REASONING

Learning Visual Reasoning Without Strong Priors

10 Jul 2017GuessWhatGame/clevr

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.

VISUAL REASONING

Cascaded Mutual Modulation for Visual Reasoning

EMNLP 2018 FlamingHorizon/CMM-VR

We propose CMM: Cascaded Mutual Modulation as a novel end-to-end visual reasoning model. Our code is available at https://github.com/FlamingHorizon/CMM-VR.

QUESTION ANSWERING VISUAL QUESTION ANSWERING VISUAL REASONING

Visual Reasoning by Progressive Module Networks

ICLR 2019 seung-kim/pmn_demo

Humans learn to solve tasks of increasing complexity by building on top of previously acquired knowledge. Thus, a module for a new task learns to query existing modules and composes their outputs in order to produce its own output.

VISUAL REASONING