Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering

CVPR 2019 Peng Gao Zhengkai Jiang Haoxuan You Pan Lu Steven C. H. Hoi Xiaogang Wang Hongsheng Li

Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK BENCHMARK
Visual Question Answering VQA v2 DFAF Accuracy 70.34% # 1

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet