Search Results for author: Xinze Guan

Found 1 papers, 0 papers with code

Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA

no code implementations29 Jan 2024 Yue Fan, Jing Gu, Kaiwen Zhou, Qianqi Yan, Shan Jiang, Ching-Chen Kuo, Xinze Guan, Xin Eric Wang

Our evaluation shows that questions in the MultipanelVQA benchmark pose significant challenges to the state-of-the-art Large Vision Language Models (LVLMs) tested, even though humans can attain approximately 99\% accuracy on these questions.

Benchmarking Image Comprehension +4

Cannot find the paper you are looking for? You can Submit a new open access paper.