MMT-Bench is a comprehensive benchmark designed to evaluate Large Vision-Language Models (LVLMs) across a wide array of multimodal tasks that require expert knowledge as well as deliberate visual recognition, localization, reasoning, and planning¹. It includes 31,325 meticulously curated multi-choice visual questions from various scenarios such as vehicle driving and embodied navigation, covering 32 core meta-tasks and 162 subtasks in multimodal understanding¹.

The benchmark aims to be comprehensive enough to assess the multitask performance of LVLMs using a task map, which helps in identifying both in-domain and out-of-domain tasks. This extensive coverage allows for a thorough evaluation of the models' capabilities in multimodal understanding¹.

(1) MMT-Bench. https://mmt-bench.github.io/. (2) OpenGVLab/MMT-Bench: ICML'2024 - GitHub. https://github.com/OpenGVLab/MMT-Bench. (3) OpenGVLab/MMT-Bench - Giters. https://giters.com/OpenGVLab/MMT-Bench.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages