Visual Coreference Resolution in Visual Dialog using Neural Module Networks

Visual dialog entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog, visual dialog encompasses several more... (read more)

PDF Abstract ECCV 2018 PDF ECCV 2018 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Visual Dialog VisDial v0.9 val CorefNMN MRR 63.6 # 5
Mean Rank 4.53 # 10
[email protected] 50.24 # 9
[email protected] 88.51 # 9
[email protected] 79.81 # 10
Visual Dialog VisDial v0.9 val CorefNMN (ResNet-152) MRR 64.1 # 3
Mean Rank 4.45 # 8
[email protected] 50.92 # 7
[email protected] 88.81 # 8
[email protected] 80.18 # 9
Common Sense Reasoning Visual Dialog v0.9 NMN [kottur2018visual] 1 in 10 [email protected] 80.1 # 1
Visual Dialog Visual Dialog v1.0 test-std CorefNMN (ResNet-152) NDCG (x 100) 54.70 # 34
MRR (x 100) 61.50 # 16
[email protected] 47.55 # 16
[email protected] 78.10 # 14
[email protected] 88.80 # 13
Mean 4.40 # 23

Methods used in the Paper