Visual Question Answering

486 papers with code • 48 benchmarks • 94 datasets

Visual Question Answering is a semantic task that aims to answer questions based on an image.

Image Source:


Use these libraries to find Visual Question Answering models and implementations
4 papers
4 papers
See all 17 libraries.

Most implemented papers

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

ramprs/grad-cam ICCV 2017

For captioning and VQA, we show that even non-attention based models can localize inputs.

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

peteanderson80/bottom-up-attention CVPR 2018

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.

VQA: Visual Question Answering

ramprs/grad-cam ICCV 2015

Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.

ParlAI: A Dialog Research Software Platform

facebookresearch/ParlAI EMNLP 2017

We introduce ParlAI (pronounced "par-lay"), an open-source software platform for dialog research implemented in Python, available at http://parl. ai.

A simple neural network module for relational reasoning

kimhc6028/relational-networks NeurIPS 2017

Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn.

Stacked Attention Networks for Image Question Answering

zcyang/imageqa-san CVPR 2016

Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively.

Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering

Cyanogenoid/pytorch-vqa 11 Apr 2017

This paper presents a new baseline for visual question answering task.

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

facebookresearch/vilbert-multi-task NeurIPS 2019

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.

Dynamic Memory Networks for Visual and Textual Question Answering

therne/dmn-tensorflow 4 Mar 2016

Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering.

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

peteanderson80/bottom-up-attention CVPR 2018

This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge.