4 dataset results for Video Retrieval AND Audio

LSMDC (Large Scale Movie Description Challenge)

This dataset contains 118,081 short video clips extracted from 202 movies. Each video has a caption, either extracted from the movie script or from transcribed DVS (descriptive video services) for the visually impaired. The validation set contains 7408 clips and evaluation is performed on a test set of 1000 videos from movies disjoint from the training and val sets.

113 PAPERS • 4 BENCHMARKS

QuerYD

A large-scale dataset for retrieval and event localisation in video. A unique feature of the dataset is the availability of two audio tracks for each video: the original audio, and a high-quality spoken description of the visual content.

11 PAPERS • 1 BENCHMARK

VTC (Videos, Titles and Comments)

VTC is a large-scale multimodal dataset containing video-caption pairs (~300k) alongside comments that can be used for multimodal representation learning.

2 PAPERS • NO BENCHMARKS YET

Shot2Story20K

A short clip of video may contain progression of multiple events and an interesting story line. A human needs to capture both the event in every shot and associate them together to understand the story behind it.

1 PAPER • 3 BENCHMARKS

Datasets

4 dataset results for Video Retrieval AND Audio