About

Visual Question Answering is a semantic task that aims to answer questions based on an image.

Image Source: visualqa.org

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Libraries

Subtasks

Datasets

Latest papers without code

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

5 May 2021

Referring Expression Comprehension (REC) has become one of the most important tasks in visual reasoning, since it is an essential step for many vision-and-language tasks such as visual question answering.

QUESTION ANSWERING REFERRING EXPRESSION COMPREHENSION VISUAL QUESTION ANSWERING VISUAL REASONING

Iterated learning for emergent systematicity in VQA

ICLR 2021

Although neural module networks have an architectural bias towards compositionality, they require gold standard layouts to generalize systematically in practice.

QUESTION ANSWERING SYSTEMATIC GENERALIZATION VISUAL QUESTION ANSWERING

A survey on VQA_Datasets and Approaches

2 May 2021

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads

30 Apr 2021

Based on this observation, we design a dynamic chopping module that can automatically remove heads and layers of the VisualBERT at an instance level when dealing with different questions.

QUESTION ANSWERING VISUAL QUESTION ANSWERING VISUAL REASONING

Optimal training of variational quantum algorithms without barren plateaus

29 Apr 2021

Variational quantum algorithms (VQAs) promise efficient use of near-term quantum computers.

QUANTUM MACHINE LEARNING VISUAL QUESTION ANSWERING

Document Collection Visual Question Answering

27 Apr 2021

Current tasks and methods in Document Understanding aims to process documents as single elements.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

InfographicVQA

26 Apr 2021

Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

RelTransformer: Balancing the Visual Relationship Detection from Local Context, Scene and Memory

24 Apr 2021

The key feature of our model is its ability to aggregate three different-level features (local context, scene, and dataset-level) to compositionally predict the visual relationship.

IMAGE CAPTIONING OBJECT RECOGNITION QUESTION ANSWERING SCENE UNDERSTANDING VISUAL QUESTION ANSWERING VISUAL RELATIONSHIP DETECTION

Playing Lottery Tickets with Vision and Language

23 Apr 2021

In this work, we perform the first empirical study to assess whether such trainable subnetworks also exist in pre-trained V+L models.

QUESTION ANSWERING REFERRING EXPRESSION COMPREHENSION VISUAL COMMONSENSE REASONING VISUAL ENTAILMENT VISUAL QUESTION ANSWERING

GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering

20 Apr 2021

Images are more than a collection of objects or attributes -- they represent a web of relationships among interconnected objects.

GRAPH QUESTION ANSWERING QUESTION ANSWERING VISUAL QUESTION ANSWERING