Bilinear Attention Networks

NeurIPS 2018 Jin-Hwa KimJaehyun JunByoung-Tak Zhang

Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Phrase Grounding Flickr30k Entities Test BAN (Bottom-Up detector) [email protected] 69.69 # 2
[email protected] 86.35 # 2
[email protected] 84.22 # 2
Visual Question Answering VQA v2 test-dev BAN+Glove+Counter Accuracy 70.04 # 7
Visual Question Answering VQA v2 test-std BAN+Glove+Counter Accuracy 70.4 # 8