Question-Guided Hybrid Convolution for Visual Question Answering

ECCV 2018 Peng GaoPan LuHongsheng LiShuang LiYikang LiSteven HoiXiaogang Wang

In this paper, we propose a novel Question-Guided Hybrid Convolution (QGHC) network for Visual Question Answering (VQA). Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features.To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Visual Question Answering CLEVR QGHC+Att+Concat Accuracy 65.90 # 1
Visual Question Answering COCO Visual Question Answering (VQA) real images 1.0 open ended QGHC+Att+Concat Percentage correct 65.90 # 3