We propose RUBi, a new learning strategy to reduce biases in any VQA model.
Ranked #7 on Visual Question Answering on VQA-CP
In this paper, we propose MuRel, a multimodal relational network which is learned end-to-end to reason over real images.
Ranked #1 on Visual Question Answering on TDIUC
We demonstrate the practical interest of our fusion model by using BLOCK for two challenging tasks: Visual Question Answering (VQA) and Visual Relationship Detection (VRD), where we design end-to-end learnable architectures for representing relevant interactions between modalities.
Similarly to self-training methods, the predictions of these initial detectors mitigate the missing annotations on the complementary datasets.
In this paper, we present a method to learn a visual representation adapted for e-commerce products.
Bilinear models provide an appealing framework for mixing and merging information in Visual Question Answering (VQA) tasks.
Ranked #20 on Visual Question Answering on VQA v2 test-dev