TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Question Answering (VQA)	GQA Test2019	VinVL+L	Accuracy	64.85	# 10
Visual Question Answering (VQA)	GQA Test2019	VinVL+L	Binary	82.59	# 5
Visual Question Answering (VQA)	GQA Test2019	VinVL+L	Open	49.19	# 12
Visual Question Answering (VQA)	GQA Test2019	VinVL+L	Consistency	94.0	# 5
Visual Question Answering (VQA)	GQA Test2019	VinVL+L	Plausibility	84.91	# 31
Visual Question Answering (VQA)	GQA Test2019	VinVL+L	Validity	96.62	# 7
Visual Question Answering (VQA)	GQA Test2019	VinVL+L	Distribution	4.59	# 118

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vinvl-l-enriching-visual-representation-with/visual-question-answering-on-gqa-test2019)](https://paperswithcode.com/sota/visual-question-answering-on-gqa-test2019?p=vinvl-l-enriching-visual-representation-with)`

VinVL+L: Enriching Visual Representation with Location Context in VQA

Computer Vision Winter Workshop 2023 · Jiří Vyskočil, Lukáš Picek ·

In this paper, we describe a novel method - VinVL+L - that enriches the visual representations (i.e. object tags and region features) of the State-of-the-Art Vision and Language (VL) method - VinVL - with Location information. To verify the importance of such metadata for VL models, we (i) trained a Swin-B model on the Places365 dataset and obtained additional sets of visual and tag features; both were made public to allow reproducibility and further experiments, (ii) did an architectural update to the existing VinVL method to include the new feature sets, and (iii) provide a qualitative and quantitative evaluation. By including just binary location metadata, the VinVL+L method provides incremental improvement to the State-of-the-Art VinVL in Visual Question Answering (VQA). The VinVL+L achieved an accuracy of 64.85% and increased the performance by +0.32% in terms of accuracy on the GQA dataset; the statistical significance of the new representations is verified via Approximate Randomization. The code and newly generated sets of features are available at https://github.com/vyskocj/VinVL-L.

PDF Abstract

Code

Add Remove Mark official

vyskocj/VinVL-L

Tasks

Add Remove

Question Answering

TAG

Visual Question Answering

Visual Question Answering (VQA)

Datasets

Places

GQA

Results from the Paper

Add Remove

Ranked #10 on Visual Question Answering (VQA) on GQA Test2019

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Question Answering (VQA)	GQA Test2019	VinVL+L	Accuracy	64.85	# 10	Compare
			Binary	82.59	# 5	Compare
			Open	49.19	# 12	Compare
			Consistency	94.0	# 5	Compare
			Plausibility	84.91	# 31	Compare
			Validity	96.62	# 7	Compare
			Distribution	4.59	# 118	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

VinVL+L: Enriching Visual Representation with Location Context in VQA

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove