TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Visual Question Answering (VQA)	OK-VQA	REVIVE (Ensemble)	Accuracy	58.0	# 11
Visual Question Answering (VQA)	OK-VQA	REVIVE (Single)	Accuracy	56.6	# 12

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/revive-regional-visual-representation-matters/visual-question-answering-on-ok-vqa)](https://paperswithcode.com/sota/visual-question-answering-on-ok-vqa?p=revive-regional-visual-representation-matters)`

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

2 Jun 2022 · Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, Lu Yuan ·

This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is extensively studied in traditional VQA, it is under-explored in knowledge-based VQA even though these two tasks share the common spirit, i.e., rely on visual input to answer the question. Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final answering model, which is counter-intuitive to some extent. Based on these observations, we propose a new knowledge-based VQA method REVIVE, which tries to utilize the explicit information of object regions not only in the knowledge retrieval stage but also in the answering model. The key motivation is that object regions and inherent relationship are important for knowledge-based VQA. We perform extensive experiments on the standard OK-VQA dataset and achieve new state-of-the-art performance, i.e., 58.0% accuracy, surpassing previous state-of-the-art method by a large margin (+3.6%). We also conduct detailed analysis and show the necessity of regional information in different framework components for knowledge-based VQA. Code is publicly available at https://github.com/yzleroy/REVIVE.

PDF Abstract

Code

Add Remove Mark official

yzleroy/revive official

Tasks

Add Remove

Question Answering

Retrieval

Visual Question Answering

Visual Question Answering (VQA)

Datasets

MS COCO

Visual Question Answering

ConceptNet

OK-VQA

Results from the Paper

Edit

Ranked #11 on Visual Question Answering (VQA) on OK-VQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Visual Question Answering (VQA)	OK-VQA	REVIVE (Ensemble)	Accuracy	58.0	# 11		Compare
Visual Question Answering (VQA)	OK-VQA	REVIVE (Single)	Accuracy	56.6	# 12		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove