Focal Visual-Text Attention for Visual Question Answering

CVPR 2018 Junwei LiangLu JiangLiangliang CaoLi-Jia LiAlexander Hauptmann

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photos, we have to look at whole collections with sequences of photos or videos... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Memex Question Answering MemexQA FVTA Accuracy 0.357 # 1