Modeling Relationships in Referential Expressions with Compositional Modular Networks

People often refer to entities in an image in terms of their relationships with other entities. For example, "the black cat sitting under the table" refers to both a "black cat" entity and its relationship with another "table" entity... (read more)

PDF Abstract CVPR 2017 PDF CVPR 2017 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Visual Question Answering Visual7W CMN Percentage correct 72.53 # 1
Visual Question Answering Visual Genome (pairs) CMN Percentage correct 28.52 # 1
Visual Question Answering Visual Genome (subjects) CMN Percentage correct 44.24 # 1

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet