Visual Reasoning

212 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Libraries

Use these libraries to find Visual Reasoning models and implementations
3 papers
8,674
3 papers
32
See all 7 libraries.

MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems

happylkx/mmcode 15 Apr 2024

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts.

6
15 Apr 2024

Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models

lavi-lab/visual-table 27 Mar 2024

When visual tables serve as standalone visual representations, our model can closely match or even beat the SOTA MLLMs that are built on CLIP visual embeddings.

7
27 Mar 2024

How Far Are We from Intelligent Visual Deductive Reasoning?

apple/ml-rpm-bench 7 Mar 2024

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks.

13
07 Mar 2024

Slot Abstractors: Toward Scalable Abstract Visual Reasoning

slotabstractor/slotabstractor 6 Mar 2024

Abstract visual reasoning is a characteristically human ability, allowing the identification of relational patterns that are abstracted away from object features, and the systematic generalization of those patterns to unseen problems.

1
06 Mar 2024

What Is Missing in Multilingual Visual Reasoning and How to Fix It

yueqis/multilingual_visual_reasoning 3 Mar 2024

NLP models today strive for supporting multiple languages and modalities, improving accessibility for diverse users.

0
03 Mar 2024

Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

ubc-nlp/peacock 1 Mar 2024

Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension.

14
01 Mar 2024

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

richard-coder-nai/disentanglement-lib-necessity 1 Mar 2024

This paper further investigates the necessity of disentangled representation in downstream applications.

1
01 Mar 2024

PALO: A Polyglot Large Multimodal Model for 5B People

mbzuai-oryx/palo 22 Feb 2024

PALO offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese, that span a total of ~5B people (65% of the world population).

70
22 Feb 2024

Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach

GuillermoPuebla/object-centric-reasoning 20 Feb 2024

To this end, these models use several kinds of attention mechanisms to segregate the individual objects in a scene from the background and from other objects.

0
20 Feb 2024

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

thudm/cogcom 6 Feb 2024

Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers.

119
06 Feb 2024