Visual Commonsense Reasoning

20 papers with code • 1 benchmarks • 4 datasets

Most implemented papers

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

facebookresearch/vilbert-multi-task NeurIPS 2019

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.

UNITER: UNiversal Image-TExt Representation Learning

ChenRocks/UNITER ECCV 2020

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

From Recognition to Cognition: Visual Commonsense Reasoning

rowanz/r2c CVPR 2019

While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world.

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

jackroos/VL-BERT ICLR 2020

We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short).

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

zhegan27/VILLA NeurIPS 2020

We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.

Think Visually: Question Answering through Virtual Imagery

umich-vl/think_visually ACL 2018

In this paper, we study the problem of geometric reasoning in the context of question-answering.

Fusion of Detected Objects in Text for Visual Question Answering

google-research/language IJCNLP 2019

To advance models of multimodal context, we introduce a simple yet powerful neural architecture for data that combines vision and natural language.

Heterogeneous Graph Learning for Visual Commonsense Reasoning

yuweijiang/HGL-pytorch NeurIPS 2019

Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.

TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines

Deanplayerljx/tab-vcr NeurIPS 2019

Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets.

Connective Cognition Network for Directional Visual Commonsense Reasoning

AmingWu/CCN NeurIPS 2019

Inspired by this idea, towards VCR, we propose a connective cognition network (CCN) to dynamically reorganize the visual neuron connectivity that is contextualized by the meaning of questions and answers.