3 dataset results for Video-Text Retrieval AND Videos AND English

Test-of-Time (Test of Time Synthetic Video Dataset)

The goal of this dataset is to probe video-language models for understanding of simple temporal relations like "before" and "after". The dataset is only meant to be an evaluation set and not a training set.

2 PAPERS • 1 BENCHMARK

VTC (Videos, Titles and Comments)

VTC is a large-scale multimodal dataset containing video-caption pairs (~300k) alongside comments that can be used for multimodal representation learning.

2 PAPERS • NO BENCHMARKS YET

SYMON

SYMON (Synopses of Movie Narratives)

Contains 5,193 video summaries of popular movies and TV series. SyMoN captures naturalistic storytelling videos for human audience made by human creators, and has higher story coverage and more frequent mental-state references than similar video-language story datasets.

3 PAPERS • NO BENCHMARKS YET

Datasets

3 dataset results for Video-Text Retrieval AND Videos AND English