2 dataset results for art AND Visual Question Answering (VQA) AND Texts

…Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset

17 PAPERS • 2 BENCHMARKS

BenchLMM (BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models)

…Utilizing BenchLMM, we comprehensively evaluate state-of-the-art LMMs and reveal: 1) LMMs generally suffer performance degradation when working with other styles; 2) An LMM performs better than another

9 PAPERS • 1 BENCHMARK

Datasets

2 dataset results for art AND Visual Question Answering (VQA) AND Texts