MAttNet: Modular Attention Network for Referring Expression Comprehension

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects... (read more)

PDF Abstract CVPR 2018 PDF CVPR 2018 Abstract

Datasets


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Referring Expression Segmentation RefCOCO testA MattNet IoU 62.37 # 4
Referring Expression Segmentation RefCOCO+ testA MattNet Overall IoU 52.39 # 3
Referring Expression Segmentation RefCOCO testB MattNet IoU 51.70 # 7
Referring Expression Segmentation RefCOCO+ test B MattNet Overall IoU 40.08 # 4
Referring Expression Segmentation RefCoCo val MattNet IoU 56.51 # 8
Referring Expression Segmentation RefCOCO+ val MattNet Overall IoU 46.67 # 4

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet