MSRVTT-QA

The MSR-VTT-QA dataset is a benchmark for the task of Visual Question Answering (VQA) on the MSR-VTT (Microsoft Research Video to Text) dataset. The MSR-VTT-QA benchmark is used to evaluate models on their ability to answer questions based on these videos. It's part of the tasks that this dataset is used for, along with Video Retrieval, Video Captioning, Zero-Shot Video Question Answering, Zero-Shot Video Retrieval, and Text-to-Video Generation.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Visual Question Answering (VQA)	MSRVTT-QA	VLAB
Zero-Shot Video Question Answer	MSRVTT-QA	PLLaVA
Video Question Answering	MSRVTT-QA	Mirasol3B
Visual Question Answering	MSRVTT-QA	Aurora
Zero-Shot Learning	MSRVTT-QA	HiTeA
Zeroshot Video Question Answer	MSRVTT-QA	FrozeBiLM