…Each video is labelled with 3.91 step segments, where each segment lasts 14.91 seconds on average. In total, the dataset contains videos of 476 hours, with 46,354 annotated segments.
78 PAPERS • 2 BENCHMARKS
A three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
16 PAPERS • NO BENCHMARKS YET
…The videos are densely annotated with six types of labels: object and point tracks, temporal action and sound segments, multiple-choice video question-answers and grounded video question-answers.
4 PAPERS • NO BENCHMARKS YET
…EPIC-KITCHENS-55), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments
134 PAPERS • 7 BENCHMARKS