Our analysis leads to a novel taxonomy of visual reasoning tasks, which can be primarily explained by both the type of relations (same-different vs. spatial-relation judgments) and the number of relations used to compose the underlying rules.
We use this new evaluation in a large-scale study of existing approaches for VQA.
Ranked #1 on Visual Question Answering on VQA-CE
First, we propose the Modifying Count Distribution (MCD) protocol, which penalizes models that over-rely on statistical shortcuts.
We propose RUBi, a new learning strategy to reduce biases in any VQA model.
Ranked #7 on Visual Question Answering on VQA-CP
In this paper, we propose MuRel, a multimodal relational network which is learned end-to-end to reason over real images.
Ranked #1 on Visual Question Answering on TDIUC
This work presents an in-depth analysis of the majority of the deep neural networks (DNNs) proposed in the state of the art for image recognition.