11 dataset results for segmentation AND Action Recognition

…Each video is labelled with 3.91 step segments, where each segment lasts 14.91 seconds on average. In total, the dataset contains videos of 476 hours, with 46,354 annotated segments.

78 PAPERS • 2 BENCHMARKS

EPIC-SOUNDS

EPIC-SOUNDS includes 78.4k categorised and 39.2k non-categorised segments of audible events and actions, distributed across 44 classes.

7 PAPERS • 2 BENCHMARKS

Assembly101

…Sequences are annotated with more than 100K coarse and 1M fine-grained action segments, and 18M 3D hand poses. We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Additionally, we propose a novel task of detecting mistakes.

38 PAPERS • 4 BENCHMARKS

EgoHands

…The dataset offers high quality, pixel-level segmentations of hands the possibility to semantically distinguish between the observer’s hands and someone else’s hands, as well as left and right hands

30 PAPERS • NO BENCHMARKS YET

EPIC-KITCHENS-55

…Each video is split into short action segments (mean duration is 3.7s) with specific start and end times and a verb and noun annotation describing the action (e.g. ‘open fridge‘).

35 PAPERS • 3 BENCHMARKS

Animal Kingdom

…More specifically, the dataset contains 50 hours of annotated videos to localize relevant animal behavior segments in long videos for the video grounding task, 30K video sequences for the fine-grained

14 PAPERS • 2 BENCHMARKS

PETRAW

PETRAW (PEg TRAnsfer Workflow recognition by different modalities)

…A case was composed of kinematic data, a video, semantic segmentation of each frame, and workflow annotation.

3 PAPERS • 6 BENCHMARKS

MECCANO

…Video Acquisition: 1920x1080 at 12.00 fps 11 training videos and 9 validation/test videos 8857 video segments temporally annotated indicating the verbs which describe the actions performed 64349 active

14 PAPERS • 3 BENCHMARKS

HA-ViD (HA-ViD: A Human Assembly Video Dataset)

…We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking.

1 PAPER • NO BENCHMARKS YET

EPIC-KITCHENS-100

…EPIC-KITCHENS-55), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments

135 PAPERS • 7 BENCHMARKS

AVA (Atomic Visual Actions)

…AVA Speech densely annotates audio-based speech activity in AVA v1.0 videos, and explicitly labels 3 background noise conditions, resulting in ~46K labeled segments spanning 45 hours of data.

94 PAPERS • 7 BENCHMARKS

Datasets

11 dataset results for segmentation AND Action Recognition