Question Type Guided Attention in Visual Question Answering

Visual Question Answering (VQA) requires integration of feature maps with drastically different structures and focus of the correct regions. Image descriptors have structures at multiple spatial scales, while lexical inputs inherently follow a temporal sequence and naturally cluster into semantically different question types... (read more)

PDF Abstract ECCV 2018 PDF ECCV 2018 Abstract
No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Average Pooling
Pooling Operations
ReLU
Activation Functions
1x1 Convolution
Convolutions
Batch Normalization
Normalization
Bottleneck Residual Block
Skip Connection Blocks
Global Average Pooling
Pooling Operations
Residual Block
Skip Connection Blocks
Kaiming Initialization
Initialization
RPN
Region Proposal
Max Pooling
Pooling Operations
Residual Connection
Skip Connections
Softmax
Output Functions
Convolution
Convolutions
RoIPool
RoI Feature Extractors
Faster R-CNN
Object Detection Models
ResNet
Convolutional Neural Networks