Towards VQA Models That Can Read

Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today's VQA models can not read!.. (read more)

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract

Datasets


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Visual Question Answering VizWiz 2018 Pythia v0.3 overall 54.72 # 3
Visual Question Answering VQA v2 test-dev Pythia v0.3 + LoRRA Accuracy 69.21 # 12

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet