Visual Reasoning

212 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Libraries

Use these libraries to find Visual Reasoning models and implementations
3 papers
8,674
3 papers
32
See all 7 libraries.

Neural networks for abstraction and reasoning: Towards broad generalization in machines

mxbi/arckit 5 Feb 2024

We present the Perceptual Abstraction and Reasoning Language (PeARL) language, which allows DreamCoder to solve ARC tasks, and propose a new recognition model that allows us to significantly improve on the previous best implementation. We also propose a new encoding and augmentation scheme that allows large language models (LLMs) to solve ARC tasks, and find that the largest models can solve some ARC tasks.

63
05 Feb 2024

Prompting Large Vision-Language Models for Compositional Reasoning

tossowski/keycomp 20 Jan 2024

Vision-language models such as CLIP have shown impressive capabilities in encoding texts and images into aligned embeddings, enabling the retrieval of multimodal data in a shared embedding space.

2
20 Jan 2024

Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually

secureaiautonomylab/conditionalvlm 19 Jan 2024

This process involves addressing two key problems: (1) the reason for obfuscating unsafe images demands the platform to provide an accurate rationale that must be grounded in unsafe image-specific attributes, and (2) the unsafe regions in the image must be minimally obfuscated while still depicting the safe regions.

0
19 Jan 2024

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

shi-labs/vcoder 21 Dec 2023

Secondly, we leverage the images from COCO and outputs from off-the-shelf vision perception models to create our COCO Segmentation Text (COST) dataset for training and evaluating MLLMs on the object perception task.

228
21 Dec 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

bradyfu/awesome-multimodal-large-language-models 19 Dec 2023

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

8,783
19 Dec 2023

One Self-Configurable Model to Solve Many Abstract Visual Reasoning Problems

mikomel/sal 15 Dec 2023

With the aim of developing universal learning systems in the AVR domain, we propose the unified model for solving Single-Choice Abstract visual Reasoning tasks (SCAR), capable of solving various single-choice AVR tasks, without making any a priori assumptions about the task structure, in particular the number and location of panels.

0
15 Dec 2023

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

aifeg/benchlmm 5 Dec 2023

Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown remarkable capabilities in visual reasoning with common image styles.

80
05 Dec 2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

artemisp/lavis-xinstructblip 30 Nov 2023

Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs).

37
30 Nov 2023

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

01-ai/yi 27 Nov 2023

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.

7,093
27 Nov 2023

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

ucsc-vlaa/vllm-safety-benchmark 27 Nov 2023

Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.

44
27 Nov 2023