TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 open ended	SAN	Percentage correct	58.9	# 11
Visual Question Answering (VQA)	VQA v1 test-std	SAN (VGG)	Accuracy	58.9	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/stacked-attention-networks-for-image-question/visual-question-answering-on-vqa-v1-test-std)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v1-test-std?p=stacked-attention-networks-for-image-question)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/stacked-attention-networks-for-image-question/visual-question-answering-on-coco-visual-4)](https://paperswithcode.com/sota/visual-question-answering-on-coco-visual-4?p=stacked-attention-networks-for-image-question)`

Stacked Attention Networks for Image Question Answering

CVPR 2016 · Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola ·

This paper presents stacked attention networks (SANs) that learn to answer natural language questions from images. SANs use semantic representation of a question as query to search for the regions in an image that are related to the answer. We argue that image question answering (QA) often requires multiple steps of reasoning. Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively. Experiments conducted on four image QA data sets demonstrate that the proposed SANs significantly outperform previous state-of-the-art approaches. The visualization of the attention layers illustrates the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer.

PDF Abstract CVPR 2016 PDF CVPR 2016 Abstract

Code

Add Remove Mark official

zcyang/imageqa-san

104

abhshkdz/neural-vqa-attention

Shivanshu-Gupta/Visual-Question-Ans…

Cold-Winter/vqs

TingAnChien/san-vqa-tensorflow

See all 16 implementations

Tasks

Add Remove

Visual Question Answering (VQA)

Datasets

MS COCO

COCO-QA

Results from the Paper

Edit

Ranked #5 on Visual Question Answering (VQA) on VQA v1 test-std

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 open ended	SAN	Percentage correct	58.9	# 11		Compare
Visual Question Answering (VQA)	VQA v1 test-std	SAN (VGG)	Accuracy	58.9	# 5		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Stacked Attention Networks for Image Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove