MMT-Bench is a comprehensive benchmark designed to evaluate Large Vision-Language Models (LVLMs) across a wide array of multimodal tasks that require expert knowledge as well as deliberate visual recognition, localization, reasoning, and planning¹. It includes 31,325 meticulously curated multi-choice visual questions from various scenarios such as vehicle driving and embodied navigation, covering 32 core meta-tasks and 162 subtasks in multimodal understanding¹.
The benchmark aims to be comprehensive enough to assess the multitask performance of LVLMs using a task map, which helps in identifying both in-domain and out-of-domain tasks. This extensive coverage allows for a thorough evaluation of the models' capabilities in multimodal understanding¹.
(1) MMT-Bench. https://mmt-bench.github.io/. (2) OpenGVLab/MMT-Bench: ICML'2024 - GitHub. https://github.com/OpenGVLab/MMT-Bench. (3) OpenGVLab/MMT-Bench - Giters. https://giters.com/OpenGVLab/MMT-Bench.
Paper | Code | Results | Date | Stars |
---|