TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Visual Question Answering (VQA)	VQA v1 test-dev	NMN+LSTM+FT	Accuracy	58.6	# 7
Visual Question Answering (VQA)	VQA v1 test-std	NMN+LSTM+FT	Accuracy	58.7	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/neural-module-networks/visual-question-answering-on-vqa-v1-test-std)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v1-test-std?p=neural-module-networks)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/neural-module-networks/visual-question-answering-on-vqa-v1-test-dev)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v1-test-dev?p=neural-module-networks)`

Neural Module Networks

CVPR 2016 · Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein ·

Visual question answering is fundamentally compositional in nature---a question like "where is the dog?" shares substructure with questions like "what color is the dog?" and "where is the cat?" This paper seeks to simultaneously exploit the representational capacity of deep networks and the compositional linguistic structure of questions. We describe a procedure for constructing and learning *neural module networks*, which compose collections of jointly-trained neural "modules" into deep networks for question answering. Our approach decomposes questions into their linguistic substructures, and uses these structures to dynamically instantiate modular networks (with reusable components for recognizing dogs, classifying colors, etc.). The resulting compound networks are jointly trained. We evaluate our approach on two challenging datasets for visual question answering, achieving state-of-the-art results on both the VQA natural image dataset and a new dataset of complex questions about abstract shapes.

PDF Abstract CVPR 2016 PDF CVPR 2016 Abstract

Code

Add Remove Mark official

jacobandreas/nmn2

403

Tasks

Add Remove

Visual Question Answering

Visual Question Answering (VQA)

Datasets

Introduced in the Paper:

SHAPES

Used in the Paper:

MS COCO

Visual Question Answering

Results from the Paper

Edit

Ranked #6 on Visual Question Answering (VQA) on VQA v1 test-std

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Visual Question Answering (VQA)	VQA v1 test-dev	NMN+LSTM+FT	Accuracy	58.6	# 7		Compare
Visual Question Answering (VQA)	VQA v1 test-std	NMN+LSTM+FT	Accuracy	58.7	# 6		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Neural Module Networks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove