Learning to Reason: End-to-End Module Networks for Visual Question Answering

ICCV 2017 Ronghang HuJacob AndreasMarcus RohrbachTrevor DarrellKate Saenko

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer "is there an equal number of balls and boxes?".. (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Visual Question Answering VQA v2 test-dev N2NMN (ResNet-152, policy search) Accuracy 64.9 # 16