Visual Reasoning

214 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Libraries

Use these libraries to find Visual Reasoning models and implementations
3 papers
8,762
3 papers
32
See all 7 libraries.

Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach

GuillermoPuebla/object-centric-reasoning 20 Feb 2024

To this end, these models use several kinds of attention mechanisms to segregate the individual objects in a scene from the background and from other objects.

0
20 Feb 2024

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

thudm/cogcom 6 Feb 2024

Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers.

125
06 Feb 2024

Neural networks for abstraction and reasoning: Towards broad generalization in machines

mxbi/arckit 5 Feb 2024

We present the Perceptual Abstraction and Reasoning Language (PeARL) language, which allows DreamCoder to solve ARC tasks, and propose a new recognition model that allows us to significantly improve on the previous best implementation. We also propose a new encoding and augmentation scheme that allows large language models (LLMs) to solve ARC tasks, and find that the largest models can solve some ARC tasks.

63
05 Feb 2024

Prompting Large Vision-Language Models for Compositional Reasoning

tossowski/keycomp 20 Jan 2024

Vision-language models such as CLIP have shown impressive capabilities in encoding texts and images into aligned embeddings, enabling the retrieval of multimodal data in a shared embedding space.

2
20 Jan 2024

Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually

secureaiautonomylab/conditionalvlm 19 Jan 2024

This process involves addressing two key problems: (1) the reason for obfuscating unsafe images demands the platform to provide an accurate rationale that must be grounded in unsafe image-specific attributes, and (2) the unsafe regions in the image must be minimally obfuscated while still depicting the safe regions.

0
19 Jan 2024

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

shi-labs/vcoder 21 Dec 2023

Secondly, we leverage the images from COCO and outputs from off-the-shelf vision perception models to create our COCO Segmentation Text (COST) dataset for training and evaluating MLLMs on the object perception task.

232
21 Dec 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

bradyfu/awesome-multimodal-large-language-models 19 Dec 2023

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

9,078
19 Dec 2023

One Self-Configurable Model to Solve Many Abstract Visual Reasoning Problems

mikomel/sal 15 Dec 2023

With the aim of developing universal learning systems in the AVR domain, we propose the unified model for solving Single-Choice Abstract visual Reasoning tasks (SCAR), capable of solving various single-choice AVR tasks, without making any a priori assumptions about the task structure, in particular the number and location of panels.

0
15 Dec 2023

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

aifeg/benchlmm 5 Dec 2023

Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown remarkable capabilities in visual reasoning with common image styles.

80
05 Dec 2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

artemisp/lavis-xinstructblip 30 Nov 2023

Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs).

38
30 Nov 2023