Visual Reasoning

215 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Libraries

Use these libraries to find Visual Reasoning models and implementations
3 papers
8,830
3 papers
32
See all 7 libraries.

Latest papers with no code

Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

no code yet • 30 Apr 2024

We propose the Language-Regularized Concept Learner (LARC), which uses constraints from language as regularization to significantly improve the accuracy of neuro-symbolic concept learners in the naturally supervised setting.

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

no code yet • 26 Apr 2024

Specifically, we design a vision-based edit generator and state evaluator to work together to find the correct sequence of actions to achieve the goal.

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

no code yet • 24 Apr 2024

This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability.

Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

no code yet • 23 Apr 2024

The Think phase first decomposes the compositional question into a sequence of steps, and then the Program phase grounds each step to a piece of code and calls carefully designed 3D visual perception modules.

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

no code yet • 16 Apr 2024

Large Vision-Language Models (LVLMs), due to the remarkable visual reasoning ability to understand images and videos, have received widespread attention in the autonomous driving domain, which significantly advances the development of interpretable end-to-end autonomous driving.

Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

no code yet • 9 Apr 2024

In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong.

Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models

no code yet • 28 Mar 2024

The surge of Multimodal Large Language Models (MLLMs), given their prominent emergent capabilities in instruction following and reasoning, has greatly advanced the field of visual reasoning.

PropTest: Automatic Property Testing for Improved Visual Programming

no code yet • 25 Mar 2024

Visual Programming has emerged as an alternative to end-to-end black-box visual reasoning models.

VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding

no code yet • 21 Mar 2024

In contrast, this paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of LLMs.

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

no code yet • 19 Mar 2024

Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities.