4 dataset results for art AND Visual Question Answering (VQA)

The question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods based on paintings and comments provided in an existing art understanding dataset.

6 PAPERS • NO BENCHMARKS YET

InfoSeek (Visual Information Seeking)

…Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset

17 PAPERS • 2 BENCHMARKS

QLEVR

…We describe how the dataset was created and present a first evaluation of state-of-the-art visual question-answering models, showing that QLEVR presents a formidable challenge to our current models.

1 PAPER • 1 BENCHMARK

BenchLMM (BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models)

…Utilizing BenchLMM, we comprehensively evaluate state-of-the-art LMMs and reveal: 1) LMMs generally suffer performance degradation when working with other styles; 2) An LMM performs better than another

9 PAPERS • 1 BENCHMARK

Datasets

4 dataset results for art AND Visual Question Answering (VQA)