Towards VQA Models That Can Read

CVPR 2019 Amanpreet SinghVivek NatarajanMeet ShahYu JiangXinlei ChenDhruv BatraDevi ParikhMarcus Rohrbach

Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today's VQA models can not read!.. (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Visual Question Answering TextVQA Test Pythia + LoRRA Accuracy 27.63 # 1
Visual Question Answering TextVQA Val Pythia + LoRRA Accuracy 26.56 # 1
Visual Question Answering VizWiz Pythia v0.3 (Ours) Accuracy 54.72% # 2
Visual Question Answering VQA v2 test-dev Pythia v0.3 + LoRRA Accuracy 69.21 # 10