Zero-shot Visual Question Answering using Knowledge Graph

12 Jul 2021  ·  Zhuo Chen, Jiaoyan Chen, Yuxia Geng, Jeff Z. Pan, Zonggang Yuan, Huajun Chen ·

Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal F-VQA task.

PDF Abstract


Introduced in the Paper:


Used in the Paper:

Visual Question Answering OK-VQA KVQA
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Visual Question Answering (VQA) F-VQA ZS-F-VQA Accuracy 88.49 # 1
MRR 0.685 # 1
MR 9.17 # 1
Top-1 Accuracy 58.27 # 1
Top-3 Accuracy 76.51 # 1
Visual Question Answering (VQA) ZS-F-VQA SAN † - hard mask Top-1 Accuracy 29.39 # 1


No methods listed for this paper. Add relevant methods here