MVBench is a comprehensive Multi-modal Video understanding Benchmark. It was introduced to evaluate the comprehension capabilities of Multi-modal Large Language Models (MLLMs), particularly their temporal understanding in dynamic video tasks. MVBench covers 20 challenging video tasks that cannot be effectively solved with a single frame. It introduces a novel static-to-dynamic method to define these temporal-related tasks. By transforming various static tasks into dynamic ones, it enables the systematic generation of video tasks that require a broad spectrum of temporal skills, ranging from perception to cognition.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages