Neural Module Networks

CVPR 2016 Jacob AndreasMarcus RohrbachTrevor DarrellDan Klein

Visual question answering is fundamentally compositional in nature---a question like "where is the dog?" shares substructure with questions like "what color is the dog?".. (read more)

PDF Abstract CVPR 2016 PDF CVPR 2016 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Visual Question Answering VQA v1 test-dev NMN+LSTM+FT Accuracy 58.6 # 7
Visual Question Answering VQA v1 test-std NMN+LSTM+FT Accuracy 58.7 # 6

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet