Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

CVPR 2018 Damien TeneyPeter AndersonXiaodong HeAnton van den Hengel

This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge. VQA is a task of significant importance for research in artificial intelligence, given its multimodal nature, clear evaluation protocol, and potential real-world applications... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Visual Question Answering VizWiz Pythia v0.3 Accuracy 54.72% # 2
Visual Question Answering VQA v2 test-dev Image features from bottom-up attention (adaptive K, ensemble) Accuracy 69.87 # 9
Visual Question Answering VQA v2 test-std Image features from bottom-up attention (adaptive K, ensemble) Accuracy 70.3 # 10