TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Visual Question Answering (VQA)	CLEVR	QGHC+Att+Concat	Accuracy	65.90	# 14
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 open ended	QGHC+Att+Concat	Percentage correct	65.90	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/question-guided-hybrid-convolution-for-visual/visual-question-answering-on-coco-visual-4)](https://paperswithcode.com/sota/visual-question-answering-on-coco-visual-4?p=question-guided-hybrid-convolution-for-visual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/question-guided-hybrid-convolution-for-visual/visual-question-answering-on-clevr)](https://paperswithcode.com/sota/visual-question-answering-on-clevr?p=question-guided-hybrid-convolution-for-visual)`

Question-Guided Hybrid Convolution for Visual Question Answering

ECCV 2018 · Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang ·

In this paper, we propose a novel Question-Guided Hybrid Convolution (QGHC) network for Visual Question Answering (VQA). Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features.To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage. The question-guided convolution can tightly couple the textual and visual information but also introduce more parameters when learning kernels. We apply the group convolution, which consists of question-independent kernels and question-dependent kernels, to reduce the parameter size and alleviate over-fitting. The hybrid convolution can generate discriminative multi-modal features with fewer parameters. The proposed approach is also complementary to existing bilinear pooling fusion and attention based VQA methods. By integrating with them, our method could further boost the performance. Extensive experiments on public VQA datasets validate the effectiveness of QGHC.

PDF Abstract ECCV 2018 PDF ECCV 2018 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Question Answering

Visual Question Answering

Visual Question Answering (VQA)

Datasets

MS COCO

Visual Question Answering

CLEVR

Results from the Paper

Edit

Ranked #14 on Visual Question Answering (VQA) on CLEVR

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Visual Question Answering (VQA)	CLEVR	QGHC+Att+Concat	Accuracy	65.90	# 14		Compare
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 open ended	QGHC+Att+Concat	Percentage correct	65.90	# 3		Compare

Methods

Add Remove

Convolution

Edit Social Preview

Question-Guided Hybrid Convolution for Visual Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove