TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 multiple choice	3-Modalities: Unary + Pairwise + Ternary (ResNet)	Percentage correct	69.3	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/high-order-attention-models-for-visual/visual-question-answering-on-coco-visual-1)](https://paperswithcode.com/sota/visual-question-answering-on-coco-visual-1?p=high-order-attention-models-for-visual)`

High-Order Attention Models for Visual Question Answering

NeurIPS 2017 · Idan Schwartz, Alexander G. Schwing, Tamir Hazan ·

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.

PDF Abstract NeurIPS 2017 PDF NeurIPS 2017 Abstract

Code

Add Remove Mark official

idansc/HighOrderAtten official

Tasks

Add Remove

Question Answering

Visual Question Answering

Visual Question Answering (VQA)

Vocal Bursts Intensity Prediction

Datasets

MS COCO

Visual Question Answering

Results from the Paper

Edit

Ranked #4 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 1.0 multiple choice

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 multiple choice	3-Modalities: Unary + Pairwise + Ternary (ResNet)	Percentage correct	69.3	# 4		Compare

Methods

Add Remove

FGA

Edit Social Preview

High-Order Attention Models for Visual Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove