Browse > Computer Vision > Visual Question Answering

Visual Question Answering

105 papers with code · Computer Vision

State-of-the-art leaderboards

Greatest papers with code

Learning to Reason: End-to-End Module Networks for Visual Question Answering

ICCV 2017 tensorflow/models

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.

VISUAL QUESTION ANSWERING

ParlAI: A Dialog Research Software Platform

EMNLP 2017 facebookresearch/ParlAI

We introduce ParlAI (pronounced "par-lay"), an open-source software platform for dialog research implemented in Python, available at http://parl. ai.

VISUAL QUESTION ANSWERING

Hadamard Product for Low-rank Bilinear Pooling

14 Oct 2016facebookresearch/ParlAI

Bilinear models provide rich representations compared with linear models.

VISUAL QUESTION ANSWERING

Towards VQA Models That Can Read

CVPR 2019 facebookresearch/pythia

We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset.

VISUAL QUESTION ANSWERING

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

26 Jul 2018facebookresearch/pythia

We demonstrate that by making subtle but important changes to the model architecture and the learning rate schedule, fine-tuning image features, and adding data augmentation, we can significantly improve the performance of the up-down model on VQA v2. 0 dataset -- from 65. 67% to 70. 22%.

DATA AUGMENTATION VISUAL QUESTION ANSWERING

Bilinear Attention Networks

NeurIPS 2018 facebookresearch/pythia

In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly.

VISUAL QUESTION ANSWERING

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

CVPR 2018 facebookresearch/pythia

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.

IMAGE CAPTIONING VISUAL QUESTION ANSWERING

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

ICCV 2017 jacobgil/pytorch-grad-cam

We propose a technique for producing "visual explanations" for decisions from a large class of CNN-based models, making them more transparent.

IMAGE CLASSIFICATION INTERPRETABLE MACHINE LEARNING VISUAL QUESTION ANSWERING

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

CVPR 2018 peteanderson80/bottom-up-attention

This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge.

VISUAL QUESTION ANSWERING

Modeling Relationships in Referential Expressions with Compositional Modular Networks

CVPR 2017 hengyuan-hu/bottom-up-attention-vqa

In this paper we instead present a modular deep architecture capable of analyzing referential expressions into their component parts, identifying entities and relationships mentioned in the input expression and grounding them all in the scene.

VISUAL QUESTION ANSWERING