…Each worker is assigned with one video segment and asked to write one question with four answer candidates (one correctand three distractors).
22 PAPERS • 2 BENCHMARKS
…The videos are densely annotated with six types of labels: object and point tracks, temporal action and sound segments, multiple-choice video question-answers and grounded video question-answers.
4 PAPERS • NO BENCHMARKS YET