🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task (clear)

Filter by Language

75 dataset results for Action Recognition AND Videos

The efforts to create a non-trivial and publicly available dataset for action recognition was initiated at the KTH Royal Institute of Technology in 2004. The KTH dataset is one of the most standard datasets, which contains six actions: walk, jog, run, box, hand-wave, and hand clap. To account for performance nuance, each action is performed by 25 different individuals, and the setting is systematically altered for each action per actor. Setting variations include: outdoor (s1), outdoor with scale variation (s2), outdoor with different clothes (s3), and indoor (s4). These variations test the ability of each algorithm to identify actions independent of the background, appearance of the actors, and the scale of the actors.

257 PAPERS • 1 BENCHMARK

Action Recognition in the Dark

Action Recognition in the Dark (ARID)

ARID is a dataset for action recognition in dark videos. It consists of over 3,780 video clips with 11 action categories.

5 PAPERS • NO BENCHMARKS YET

CATER

Rendered synthetically using a library of standard 3D objects, and tests the ability to recognize compositions of object movements that require long-term reasoning.

47 PAPERS • 3 BENCHMARKS

COIN

The COIN dataset (a large-scale dataset for COmprehensive INstructional video analysis) consists of 11,827 videos related to 180 different tasks in 12 domains (e.g., vehicles, gadgets, etc.) related to our daily life. The videos are all collected from YouTube. The average length of a video is 2.36 minutes. Each video is labelled with 3.91 step segments, where each segment lasts 14.91 seconds on average. In total, the dataset contains videos of 476 hours, with 46,354 annotated segments.

79 PAPERS • 2 BENCHMARKS

Charades-Ego

Contains 68,536 activity instances in 68.8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available. Charades-Ego furthermore shares activity classes, scripts, and methodology with the Charades dataset, that consist of additional 82.3 hours of third-person video with 66,500 activity instances.

25 PAPERS • 1 BENCHMARK

Composable activities dataset

The Composable activities dataset consists of 693 videos that contain activities in 16 classes performed by 14 actors. Each activity is composed of 3 to 11 atomic actions. RGB-D data for each sequence is captured using a Microsoft Kinect sensor and estimate position of relevant body joints.

3 PAPERS • NO BENCHMARKS YET

Drive&Act

The Drive&Act dataset is a state of the art multi modal benchmark for driver behavior recognition. The dataset includes 3D skeletons in addition to frame-wise hierarchical labels of 9.6 Million frames captured by 6 different views and 3 modalities (RGB, IR and depth).

18 PAPERS • 1 BENCHMARK

Drone-Action

Drone-Action (Drone-Action: An Outdoor Recorded Drone Video Dataset for Action Recognition)

Website: https://asankagp.github.io/droneaction/

2 PAPERS • 1 BENCHMARK

EgoHands

The EgoHands dataset contains 48 Google Glass videos of complex, first-person interactions between two people. The main intention of this dataset is to enable better, data-driven approaches to understanding hands in first-person computer vision. The dataset offers

30 PAPERS • NO BENCHMARKS YET

HAA500 (Human-Centric Atomic Action Dataset)

HAA500 is a manually annotated human-centric atomic action dataset for action recognition on 500 classes with over 591k labeled frames. Unlike existing atomic action datasets, where coarse-grained atomic actions were labeled with action-verbs, e.g., "Throw", HAA500 contains fine-grained atomic actions where only consistent actions fall under the same label, e.g., "Baseball Pitching" vs "Free Throw in Basketball", to minimize ambiguities in action classification. HAA500 has been carefully curated to capture the movement of human figures with less spatio-temporal label noises to greatly enhance the training of deep neural networks.

8 PAPERS • 1 BENCHMARK

HACS (Human Action Clips and Segments)

HACS is a dataset for human action recognition. It uses a taxonomy of 200 action classes, which is identical to that of the ActivityNet-v1.3 dataset. It has 504K videos retrieved from YouTube. Each one is strictly shorter than 4 minutes, and the average length is 2.6 minutes. A total of 1.5M clips of 2-second duration are sparsely sampled by methods based on both uniform randomness and consensus/disagreement of image classifiers. 0.6M and 0.9M clips are annotated as positive and negative samples, respectively.

65 PAPERS • 2 BENCHMARKS

Jester (Gesture Recognition)

Jester Gesture Recognition dataset includes 148,092 labeled video clips of humans performing basic, pre-defined hand gestures in front of a laptop camera or webcam. It is designed for training machine learning models to recognize human hand gestures like sliding two fingers down, swiping left or right and drumming fingers.

16 PAPERS • 6 BENCHMARKS

Kinetics-Sound

This is a subset of Kinetics-400, introduced in Look, Listen and Learn by Relja Arandjelovic and Andrew Zisserman.

1 PAPER • NO BENCHMARKS YET

MCAD

MCAD (Multi-Camera Action Dataset)

Designed to evaluate the open view classification problem under the surveillance environment. In total, MCAD contains 14,298 action samples from 18 action categories, which are performed by 20 subjects and independently recorded with 5 cameras.

1 PAPER • NO BENCHMARKS YET

MMAct

MMAct is a large-scale dataset for multi/cross modal action understanding. This dataset has been recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios.

19 PAPERS • 1 BENCHMARK

MTL-AQA

A new multitask action quality assessment (AQA) dataset, the largest to date, comprising of more than 1600 diving samples; contains detailed annotations for fine-grained action recognition, commentary generation, and estimating the AQA score. Videos from multiple angles provided wherever available.

25 PAPERS • 2 BENCHMARKS

Metaphorics

Metaphorics is a newly introduced non-contextual skeleton action dataset. All the datasets introduced so far in the skeleton human action recognition have categories based only on verb-based actions.

1 PAPER • NO BENCHMARKS YET

Mouse Reach

A large, annotated video dataset of mice performing a sequence of actions. The dataset was collected and labeled by experts for the purpose of neuroscience research.

1 PAPER • NO BENCHMARKS YET

NTU RGB+D 120

NTU RGB+D 120 is a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities.

90 PAPERS • 8 BENCHMARKS

PA-HMDB51

PA-HMDB51 (Privacy Annotated HMDB51)

The Privacy Annotated HMDB51 (PA-HMDB51) dataset is a video-based dataset for evaluating pirvacy protection in visual action recognition algorithms. The dataset contains both target task labels (action) and selected privacy attributes (skin color, face, gender, nudity, and relationship) annotated on a per-frame basis.

7 PAPERS • NO BENCHMARKS YET

RareAct

RareAct is a video dataset of unusual actions, including actions like “blend phone”, “cut keyboard” and “microwave shoes”. It aims at evaluating the zero-shot and few-shot compositionality of action recognition models for unlikely compositions of common action verbs and object nouns. It contains 122 different actions which were obtained by combining verbs and nouns rarely co-occurring together in the large-scale textual corpus from HowTo100M, but that frequently appear separately.

9 PAPERS • 1 BENCHMARK

Skeletics 152

A curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset.

2 PAPERS • 1 BENCHMARK

Skeleton-Mimetics

A dataset derived from the recently introduced Mimetics dataset.

2 PAPERS • 2 BENCHMARKS

Surveillance Camera Fight Dataset

The dataset is collected from the Youtube videos that contains fight instances in it. Also, some non-fight sequences from regular surveillance camera videos are included. * There are 300 videos in total as 150 fight + 150 non-fight * Videos are 2-second long * Only the fight related parts are included in the samples

1 PAPER • NO BENCHMARKS YET

THUMOS14

The THUMOS14 (THUMOS 2014) dataset is a large-scale video dataset that includes 1,010 videos for validation and 1,574 videos for testing from 20 classes. Among all the videos, there are 220 and 212 videos with temporal annotations in validation and testing set, respectively.

290 PAPERS • 20 BENCHMARKS

TinyVIRAT

TinyVIRAT contains natural low-resolution activities. The actions in TinyVIRAT videos have multiple labels and they are extracted from surveillance videos which makes them realistic and more challenging.

4 PAPERS • NO BENCHMARKS YET

VIDIMU: Multimodal video and IMU kinematic dataset on daily life activities using affordable devices

VIDIMU: Multimodal video and IMU kinematic dataset on daily life activities using affordable devices (https://zenodo.org/record/8210563)

Human activity recognition and clinical biomechanics are challenging problems in physical telerehabilitation medicine. However, most publicly available datasets on human body movements cannot be used to study both problems in an out-of-the-lab movement acquisition setting. The objective of the VIDIMU dataset is to pave the way towards affordable patient tracking solutions for remote daily life activities recognition and kinematic analysis.

0 PAPER • NO BENCHMARKS YET

Datasets

75 dataset results for Action Recognition AND Videos