Visual Question Answering is a semantic task that aims to answer questions based on an image.
Image Source: visualqa.org
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models with an absolute performance gain of 15\% on average, strongly verifying the potential of tackling the language prior problem in VQA from the angle of the answer feature space learning.
We also investigate the utility of our model as an object detector on a given label set when fine-tuned in a few-shot setting.
Ranked #1 on Visual Question Answering on CLEVR-Humans
Most existing research on visual question answering (VQA) is limited to information explicitly present in an image or a video.
We use this new evaluation in a large-scale study of existing models.
First, as a result of a collaboration of three fields, machine learning, vision and language reasoning, and data analytics, the work lead to a direct impact on the design and training of a neural model for VQA, improving model performance as a consequence.
We introduce a new dataset called Biased MNIST that enables assessment of robustness to multiple bias sources.
Transformers are increasingly dominating multi-modal reasoning tasks, such as visual question answering, achieving state-of-the-art results thanks to their ability to contextualize information using the self-attention and co-attention mechanisms.
In this paper, we create a novel dataset, TrafficQA (Traffic Question Answering), which takes the form of video QA based on the collected 10, 080 in-the-wild videos and annotated 62, 535 QA pairs, for benchmarking the cognitive capability of causal inference and event understanding models in complex traffic scenarios.
Ranked #1 on Video Question Answering on SUTD-TrafficQA
We compare the performance of our model to that of a NN for agents that need similar time to convergence, and find that our quantum model needs approximately one-third of the parameters of the classical model to solve the Cart Pole environment in a similar number of episodes on average.