Multimodal Reasoning

88 papers with code • 3 benchmarks • 9 datasets

Reasoning over multimodal inputs.

Most implemented papers

e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations

virginie-do/e-SNLI-VE 7 Apr 2020

The recently proposed SNLI-VE corpus for recognising visual-textual entailment is a large, real-world dataset for fine-grained multimodal reasoning.

WebQA: Multihop and Multimodal QA

WebQnA/WebQA_Baseline CVPR 2022

Scaling Visual Question Answering (VQA) to the open-domain and multi-hop nature of web searches, requires fundamental advances in visual representation learning, knowledge aggregation, and language generation.

Dual Attention Networks for Multimodal Reasoning and Matching

iammrhelo/pytorch-vqa-dan CVPR 2017

We propose Dual Attention Networks (DANs) which jointly leverage visual and textual attention mechanisms to capture fine-grained interplay between vision and language.

Multimodal Analogical Reasoning over Knowledge Graphs

zjunlp/MKG_Analogy 1 Oct 2022

Analogical reasoning is fundamental to human cognition and holds an important place in various fields.

Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models

zoeyyao27/graph-of-thought 26 May 2023

Therefore, we propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph.

MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks

declare-lab/mm-bigbench 13 Oct 2023

Consequently, our work complements research on the performance of MLLMs in multimodal comprehension tasks, achieving a more comprehensive and holistic evaluation of MLLMs.

Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

declare-lab/puzzle-reasoning 6 Mar 2024

We present a new dataset, AlgoPuzzleVQA designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles that necessitate both visual understanding, language understanding, and complex algorithmic reasoning.

PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

declare-lab/llm-puzzletest 20 Mar 2024

To diagnose the reasoning challenges in large multimodal models, we progressively guide the models with our ground truth reasoning explanations for visual perception, inductive reasoning, and deductive reasoning.

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

mulab-mir/muchomusic 2 Aug 2024

Motivated by this, we introduce MuChoMusic, a benchmark for evaluating music understanding in multimodal language models focused on audio.

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

hewei2001/reachqa 24 Oct 2024

Specifically, we employ text-based synthesizing techniques to construct chart-plotting code and produce ReachQA, a dataset containing 3k reasoning-intensive charts and 20k Q&A pairs to enhance both recognition and reasoning abilities.