About

Visual Question Answering is a semantic task that aims to answer questions based on an image.

Image Source: visualqa.org

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Libraries

Subtasks

Datasets

Latest papers with code

AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

5 May 2021guoyang9/AdaVQA

Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models with an absolute performance gain of 15\% on average, strongly verifying the potential of tackling the language prior problem in VQA from the angle of the answer feature space learning.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

0
05 May 2021

CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images

13 Apr 2021shailaja183/clevr_hyp

Most existing research on visual question answering (VQA) is limited to information explicitly present in an image or a video.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

0
13 Apr 2021

MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

3 Apr 2021VirajBagal/MMBERT

Images in the medical domain are fundamentally different from the general domain images.

LANGUAGE MODELLING QUESTION ANSWERING VISUAL QUESTION ANSWERING

3
03 Apr 2021

VisQA: X-raying Vision and Language Reasoning in Transformers

2 Apr 2021Theo-Jaunet/VisQA

First, as a result of a collaboration of three fields, machine learning, vision and language reasoning, and data analytics, the work lead to a direct impact on the design and training of a neural model for VQA, improving model performance as a consequence.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

0
02 Apr 2021

An Investigation of Critical Issues in Bias Mitigation Techniques

1 Apr 2021erobic/bias-mitigators

We introduce a new dataset called Biased MNIST that enables assessment of robustness to multiple bias sources.

QUESTION ANSWERING VISUAL QUESTION ANSWERING

10
01 Apr 2021

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

29 Mar 2021hila-chefer/Transformer-MM-Explainability

Transformers are increasingly dominating multi-modal reasoning tasks, such as visual question answering, achieving state-of-the-art results thanks to their ability to contextualize information using the self-attention and co-attention mechanisms.

OBJECT DETECTION QUESTION ANSWERING SEMANTIC SEGMENTATION VISUAL QUESTION ANSWERING

49
29 Mar 2021

TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

29 Mar 2021SUTDCV/SUTD-TrafficQA

In this paper, we create a novel dataset, TrafficQA (Traffic Question Answering), which takes the form of video QA based on the collected 10, 080 in-the-wild videos and annotated 62, 535 QA pairs, for benchmarking the cognitive capability of causal inference and event understanding models in complex traffic scenarios.

AUTONOMOUS VEHICLES CAUSAL INFERENCE QUESTION ANSWERING VIDEO QUESTION ANSWERING VISUAL QUESTION ANSWERING

9
29 Mar 2021

Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning

28 Mar 2021qdevpsi3/qrl-dqn-gym

We compare the performance of our model to that of a NN for agents that need similar time to convergence, and find that our quantum model needs approximately one-third of the parameters of the classical model to solve the Cart Pole environment in a similar number of episodes on average.

DECISION MAKING OPENAI GYM Q-LEARNING QUANTUM MACHINE LEARNING VISUAL QUESTION ANSWERING

6
28 Mar 2021