Visual Commonsense Reasoning

29 papers with code • 7 benchmarks • 7 datasets

Most implemented papers

Heterogeneous Graph Learning for Visual Commonsense Reasoning

yuweijiang/HGL-pytorch NeurIPS 2019

Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.

TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines

Deanplayerljx/tab-vcr NeurIPS 2019

Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets.

Connective Cognition Network for Directional Visual Commonsense Reasoning

AmingWu/CCN NeurIPS 2019

Inspired by this idea, towards VCR, we propose a connective cognition network (CCN) to dynamically reorganize the visual neuron connectivity that is contextualized by the meaning of questions and answers.

TAB-VCR: Tags and Attributes based VCR Baselines

Deanplayerljx/tab-vcr NeurIPS 2019

Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets.

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

allenai/visual-reasoning-rationalization Findings of the Association for Computational Linguistics 2020

Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights.

MERLOT: Multimodal Neural Script Knowledge Models

rowanz/merlot NeurIPS 2021

As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future.

Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory

tanjatang/DMVCR 4 Jul 2021

Moreover, the proposed model provides intuitive interpretation into visual commonsense reasoning.

Interpretable Visual Understanding with Cognitive Attention Network

tanjatang/CAN 6 Aug 2021

While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge.

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

wadeyin9712/gd-vcr EMNLP 2021

Commonsense is defined as the knowledge that is shared by everyone.

Towards artificial general intelligence via a multimodal foundation model

neilfei/brivl-nmi 27 Oct 2021

To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks.