Visual Commonsense Reasoning

29 papers with code • 7 benchmarks • 7 datasets

Image source: Visual Commonsense Reasoning

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Commonsense Reasoning

Dataset	Best Model	Compare
GD-VCR	VisualBERT	See all
VCR (Q-AR) test	PEVL	See all
VCR (QA-R) test	PEVL	See all
VCR (Q-A) test	PEVL	See all
VCR (Q-A) dev	PEVL	See all
VCR (QA-R) dev	PEVL	See all
VCR (Q-AR) dev	PEVL	See all

Datasets

Latest papers with no code

Most implemented Social Latest No code

Making Large Multimodal Models Understand Arbitrary Visual Prompts

no code yet • 1 Dec 2023

Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain.

Paper
Add Code

Improving Vision-and-Language Reasoning via Spatial Relations Modeling

no code yet • 9 Nov 2023

Further, we design two pre-training tasks named object position regression (OPR) and spatial relation classification (SRC) to learn to reconstruct the spatial relation graph respectively.

Paper
Add Code

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

no code yet • 9 Oct 2023

We categorize the problem of VCR into visual commonsense understanding (VCU) and visual commonsense inference (VCI).

Paper
Add Code

Discovering Novel Actions in an Open World with Object-Grounded Visual Commonsense Reasoning

no code yet • 26 May 2023

Learning to infer labels in an open world, i. e., in an environment where the target ``labels'' are unknown, is an important characteristic for achieving autonomy.

Paper
Add Code

GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions

no code yet • 24 May 2023

Generalization to unseen tasks is an important ability for few-shot learners to achieve better zero-/few-shot performance on diverse tasks.

Paper
Add Code

CAVL: Learning Contrastive and Adaptive Representations of Vision and Language

no code yet • 10 Apr 2023

Visual and linguistic pre-training aims to learn vision and language representations together, which can be transferred to visual-linguistic downstream tasks.

Paper
Add Code

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

no code yet • ICCV 2023

We introduce WHOOPS!, a new dataset and benchmark for visual commonsense.

Paper
Add Code

Learning to Agree on Vision Attention for Visual Commonsense Reasoning

no code yet • 4 Feb 2023

Visual Commonsense Reasoning (VCR) remains a significant yet challenging research problem in the realm of visual reasoning.

Paper
Add Code

Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning

no code yet • 30 Jan 2023

On the other hand, BLIP-2 as an MLLM is employed to process images and texts, and the referring expressions in texts involving specific visual objects are modified with linguistic object labels to serve as comprehensible MLLM inputs.

Paper
Add Code

A survey on knowledge-enhanced multimodal learning

no code yet • 19 Nov 2022

Multimodal learning has been a field of increasing interest, aiming to combine various modalities in a single joint representation.

Paper
Add Code

Visual Commonsense Reasoning

Benchmarks Add a Result

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result