UCF101 dataset is an extension of UCF50 and consists of 13,320 video clips, which are classified into 101 categories. These 101 categories can be classified into 5 types (Body motion, Human-human interactions, Human-object interactions, Playing musical instruments and Sports). The total length of these video clips is over 27 hours. All the videos are collected from YouTube and have a fixed frame rate of 25 FPS with the resolution of 320 × 240.
1,605 PAPERS • 22 BENCHMARKS
The Kinetics dataset is a large-scale, high-quality dataset for human action recognition in videos. The dataset consists of around 500,000 video clips covering 600 human action classes with at least 600 video clips for each action class. Each video clip lasts around 10 seconds and is labeled with a single action class. The videos are collected from YouTube.
1,174 PAPERS • 28 BENCHMARKS
The HMDB51 dataset is a large collection of realistic videos from various sources, including movies and web videos. The dataset is composed of 6,766 video clips from 51 action categories (such as “jump”, “kiss” and “laugh”), with each category containing at least 101 clips. The original evaluation scheme uses three different training/testing splits. In each split, each action class has 70 clips for training and 30 clips for testing. The average accuracy over these three splits is used to measure the final performance.
768 PAPERS • 11 BENCHMARKS
Kinetics-100 is a dataset split created from the Kinetics dataset to evaluate the performance of few-shot action recognition models. 100 classes are randomly selected from a total of 400 categories, each composed of 100 examples. The 100 classes are further split into 64, 12, and 24 non-overlapping classes to use as the meta-training set, meta-validation set, and meta-testing set, respectively. Link to the selected samples can be found here: https://github.com/ffmpbgrnn/CMN/tree/master/kinetics-100
7 PAPERS • 1 BENCHMARK