…The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality All video segments were human-annotated with high-precision classification labels and bounding boxes at 1 frame per second.
7 PAPERS • 1 BENCHMARK
…Each video is split into short action segments (mean duration is 3.7s) with specific start and end times and a verb and noun annotation describing the action (e.g. ‘open fridge‘).
36 PAPERS • 3 BENCHMARKS
USC-GRAD-STDdb comprises 115 video segments containing more than 25,000 annotated frames of HD 720p resolution (≈1280x720) with small objects of interest from 16 (≈4x4) to 256 (≈16x16) as pixel area.
1 PAPER • 1 BENCHMARK