…Mechanical Turk (AMT) is used to collect annotations on HowTo100M videos. 30k 60-second clips are randomly sampled from 9,421 videos and present each clip to the turkers, who are asked to select a video segment After this segment selection step, another group of workers are asked to write descriptions for each displayed segment. These final video segments are 10-20 seconds long on average, and the length of queries ranges from 8 to 20 words.
9 PAPERS • NO BENCHMARKS YET
…The dataset consists of video retrieval, moment retrieval, and two novel moment segmentation and step captioning tasks.
2 PAPERS • NO BENCHMARKS YET
…Each worker is assigned with one video segment and asked to write one question with four answer candidates (one correctand three distractors).
22 PAPERS • 2 BENCHMARKS
…The videos in the dataset are divided into 5-second segments to reduce the complexity of annotation.
183 PAPERS • 3 BENCHMARKS