Browse > Computer Vision > Visual Question Answering

Visual Question Answering

141 papers with code · Computer Vision

Leaderboards

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Latest papers with code

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

13 Apr 2020microsoft/Oscar

Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.

TEXT-IMAGE RETRIEVAL VISUAL QUESTION ANSWERING

43
13 Apr 2020

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos

12 Apr 2020AIM3-RUC/YouMakeup_Baseline

The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e. g. makeup instructional videos.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

17
12 Apr 2020

A negative case analysis of visual grounding methods for VQA

12 Apr 2020erobic/negative_analysis_of_grounding

Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

7
12 Apr 2020

Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions

12 Apr 2020tttamaki/vqd

Detailed analysis on the VQA v2 dataset reveals that 1) all methods show poor performances on the most difficult cluster (about 10% accuracy), 2) as the cluster difficulty increases, the answers predicted by the different methods begin to differ, and 3) the values of cluster entropy are highly correlated with the cluster accuracy.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

5
12 Apr 2020

X-Linear Attention Networks for Image Captioning

31 Mar 2020Panda-Peter/image-captioning

Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2$^{nd}$ order interactions across multi-modal inputs.

FINE-GRAINED VISUAL RECOGNITION IMAGE CAPTIONING QUESTION ANSWERING VISUAL QUESTION ANSWERING

65
31 Mar 2020

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

31 Mar 2020ricolike/mmgnn_textvqa

Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes.

QUESTION ANSWERING SCENE TEXT VISUAL QUESTION ANSWERING

6
31 Mar 2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering

14 Mar 2020yanxinzju/CSS-VQA

To reduce the language biases, several recent works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on VQA-CP.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

10
14 Mar 2020

PathVQA: 30000+ Questions for Medical Visual Question Answering

7 Mar 2020UCSD-AI4H/PathVQA

To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer.

MEDICAL VISUAL QUESTION ANSWERING QUESTION ANSWERING VISUAL QUESTION ANSWERING

17
07 Mar 2020

Hierarchical Conditional Relation Networks for Video Question Answering

25 Feb 2020thaolmk54/hcrn-videoqa

Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts.

QUESTION ANSWERING VIDEO QUESTION ANSWERING VISUAL QUESTION ANSWERING

33
25 Feb 2020

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

24 Feb 2020xinke-wang/Awesome-Text-VQA

Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize.

QUESTION ANSWERING SCENE TEXT VISUAL QUESTION ANSWERING

40
24 Feb 2020