4 dataset results for segmentation AND Video Retrieval AND Texts

…Mechanical Turk (AMT) is used to collect annotations on HowTo100M videos. 30k 60-second clips are randomly sampled from 9,421 videos and present each clip to the turkers, who are asked to select a video segment After this segment selection step, another group of workers are asked to write descriptions for each displayed segment. These final video segments are 10-20 seconds long on average, and the length of queries ranges from 8 to 20 words.

9 PAPERS • NO BENCHMARKS YET

HiREST (HIerarchical REtrieval and STep-captioning)

…The dataset consists of video retrieval, moment retrieval, and two novel moment segmentation and step captioning tasks.

2 PAPERS • NO BENCHMARKS YET

How2QA

…Each worker is assigned with one video segment and asked to write one question with four answer candidates (one correctand three distractors).

22 PAPERS • 2 BENCHMARKS

DiDeMo (Distinct Describable Moments)

…The videos in the dataset are divided into 5-second segments to reduce the complexity of annotation.

183 PAPERS • 3 BENCHMARKS

Datasets

4 dataset results for segmentation AND Video Retrieval AND Texts