MVBench

Introduced by Li et al. in MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

MVBench is a comprehensive Multi-modal Video understanding Benchmark. It was introduced to evaluate the comprehension capabilities of Multi-modal Large Language Models (MLLMs), particularly their temporal understanding in dynamic video tasks. MVBench covers 20 challenging video tasks that cannot be effectively solved with a single frame. It introduces a novel static-to-dynamic method to define these temporal-related tasks. By transforming various static tasks into dynamic ones, it enables the systematic generation of video tasks that require a broad spectrum of temporal skills, ranging from perception to cognition.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Video Question Answering	MVBench	PLLaVA

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Video Question Answering

Similar Datasets

MSVD-QA

InfiMM-Eval

BenchLMM

VideoInstruct

Usage

License

Unknown

MVBench

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

MSVD-QA

InfiMM-Eval

BenchLMM

VideoInstruct

Usage

License

Modalities

Languages

MVBench

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

MSVD-QA

InfiMM-Eval

BenchLMM

VideoInstruct

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages