🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task (clear)

Filter by Language

75 dataset results for Action Recognition AND Videos

The Composable activities dataset consists of 693 videos that contain activities in 16 classes performed by 14 actors. Each activity is composed of 3 to 11 atomic actions. RGB-D data for each sequence is captured using a Microsoft Kinect sensor and estimate position of relevant body joints.

3 PAPERS • NO BENCHMARKS YET

DCASE 2014

DCASE2014 is an audio classification benchmark.

3 PAPERS • NO BENCHMARKS YET

IfAct

IfAct (Identifying Human Actions Visible in Online Vlogs)

We consider the task of identifying human actions visible in online videos. We focus on the widely spread genre of lifestyle vlogs, which consist of videos of people performing actions while verbally describing them. Our goal is to identify if actions mentioned in the speech description of a video are visually present.

3 PAPERS • NO BENCHMARKS YET

PETRAW

PETRAW (PEg TRAnsfer Workflow recognition by different modalities)

PETRAW data set was composed of 150 sequences of peg transfer training sessions. The objective of the peg transfer session is to transfer 6 blocks from the left to the right and back. Each block must be extracted from a peg with one hand, transferred to the other hand, and inserted in a peg at the other side of the board. All cases were acquired by a non-medical expert on the LTSI Laboratory from the University of Rennes. The data set was divided into a training data set composed of 90 cases and a test data set composed of 60 cases. A case was composed of kinematic data, a video, semantic segmentation of each frame, and workflow annotation.

3 PAPERS • 6 BENCHMARKS

RoCoG-v2 (Robot Control Gestures)

RoCoG-v2 (Robot Control Gestures) is a dataset intended to support the study of synthetic-to-real and ground-to-air video domain adaptation. It contains over 100K synthetically-generated videos of human avatars performing gestures from seven (7) classes. It also provides videos of real humans performing the same gestures from both ground and air perspectives

3 PAPERS • 1 BENCHMARK

Drone-Action

Drone-Action (Drone-Action: An Outdoor Recorded Drone Video Dataset for Action Recognition)

Website: https://asankagp.github.io/droneaction/

2 PAPERS • 1 BENCHMARK

Fitness-AQA (Fitness Action Quality Assessment [ECCV 2022])

Largest, first-of-its-kind, in-the-wild, fine-grained workout/exercise posture analysis dataset, covering three different exercises: BackSquat, Barbell Row, and Overhead Press. Seven different types of exercise errors are covered. Unlabeled data is also provided to facilitate self-supervised learning.

2 PAPERS • NO BENCHMARKS YET

MetaVD (Meta Video Dataset)

MetaVD is a Meta Video Dataset for enhancing human action recognition datasets. It provides human-annotated relationship labels between action classes across human action recognition datasets. MetaVD is proposed in the following paper: Yuya Yoshikawa, Yutaro Shigeto, and Akikazu Takeuchi. "MetaVD: A Meta Video Dataset for enhancing human action recognition datasets." Computer Vision and Image Understanding 212 (2021): 103276. [link]

2 PAPERS • NO BENCHMARKS YET

Skeletics 152

A curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset.

2 PAPERS • 1 BENCHMARK

Skeleton-Mimetics

A dataset derived from the recently introduced Mimetics dataset.

2 PAPERS • 2 BENCHMARKS

AVMIT (Audiovisual Moments in Time)

Audiovisual Moments in Time (AVMIT) is a large-scale dataset of audiovisual action events. The dataset includes the annotation of 57,177 audiovisual videos from the Moments in Time dataset, each independently evaluated by 3 of 11 trained participants. Each annotation pertains to whether the labelled audiovisual action event is present and whether it is the most prominent feature of the video. The dataset also provides a curated test set of 960 videos across 16 classes, suitable for comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance.

1 PAPER • NO BENCHMARKS YET

BEAR (Benchmark on video Action Recognition)

BEAR (Benchmark on video Action Recognition) is a collection of 18 video datasets grouped into 5 categories (anomaly, gesture, daily, sports, and instructional), which covers a diverse set of real-world applications.

1 PAPER • NO BENCHMARKS YET

CVB (Video Dataset of Cattle Visual Behaviors)

Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important cattle behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data. The dataset is presented in the form of following three sub-directories. 1. raw_frames: contains 450 frames in each sub folder representing a 15 second video taken at a frame rate of 30 FPS. 2. annotations: contains the json file

1 PAPER • NO BENCHMARKS YET

HA-ViD (HA-ViD: A Human Assembly Video Dataset)

Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD – an assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view and multi-modality videos, 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance and the further reasoning steps for comprehending knowledge in assembly progress, process effici

1 PAPER • NO BENCHMARKS YET

InHARD (Industrial Human Action Recognition Dataset in the Context of Industrial Collaborative Robotics)

We introduce a RGB+S dataset named “Industrial Human Action Recognition Dataset” (InHARD) from a real-world setting for industrial human action recognition with over 2 million frames, collected from 16 distinct subjects. This dataset contains 13 different industrial action classes and over 4800 action samples. The introduction of this dataset should allow us the study and development of various learning techniques for the task of human actions analysis inside industrial environments involving human robot collaborations.

1 PAPER • NO BENCHMARKS YET

Kinetics-Sound

This is a subset of Kinetics-400, introduced in Look, Listen and Learn by Relja Arandjelovic and Andrew Zisserman.

1 PAPER • NO BENCHMARKS YET

MCAD

MCAD (Multi-Camera Action Dataset)

Designed to evaluate the open view classification problem under the surveillance environment. In total, MCAD contains 14,298 action samples from 18 action categories, which are performed by 20 subjects and independently recorded with 5 cameras.

1 PAPER • NO BENCHMARKS YET

MOD20

MOD20 is an action recognition dataset consisting of videos collected from YouTube and our own drone. The dataset contains 2,324 videos lasting a total of 240 minutes. The actions were selected from challenging and complex scenarios, and cover multiple viewpoints, from ground-level to bird's-eye view. The substantial variation in body size, number of people, viewpoints, camera motion, and background makes this dataset challenging for action recognition. The action classes, 720×720 size un-distorted clips and multi-viewpoint video selection extend the dataset's applicability to a wider research community.

1 PAPER • NO BENCHMARKS YET

Metaphorics

Metaphorics is a newly introduced non-contextual skeleton action dataset. All the datasets introduced so far in the skeleton human action recognition have categories based only on verb-based actions.

1 PAPER • NO BENCHMARKS YET

Mouse Reach

A large, annotated video dataset of mice performing a sequence of actions. The dataset was collected and labeled by experts for the purpose of neuroscience research.

1 PAPER • NO BENCHMARKS YET

Surveillance Camera Fight Dataset

The dataset is collected from the Youtube videos that contains fight instances in it. Also, some non-fight sequences from regular surveillance camera videos are included. * There are 300 videos in total as 150 fight + 150 non-fight * Videos are 2-second long * Only the fight related parts are included in the samples

1 PAPER • NO BENCHMARKS YET

TinyVIRAT-v2

TinyVIRAT-v2 is a benchmark dataset for recognizing real-world low-resolution activities present in videos. The dataset is comprised of naturally occuring low-resolution actions. This is an extension of the TinyVIRAT dataset and consists of actions with multiple labels. The videos are extracted from security videos which makes them realistic and more challenging.

1 PAPER • NO BENCHMARKS YET

UCF-101 VIPriors subset

The VIriors Action Recognition Challenge uses a subset of the UCF101 action recognition dataset:

1 PAPER • NO BENCHMARKS YET

UCF101-DS (UCF101 Distribution Shift)

Existing benchmark datasets in real-world distribution shifts are generally synthetically generated via augmentations to simulate real-world shifts such as weather and camera rotation. The UCF101-DS dataset consists of real-world distribution shifts from user-generated videos without synthetic augmentation. It has videos for 47 UCF-101 classes with 63 different distribution shifts that can be categorized into 15 categories. A total of 536 unique videos split into a total of 4,708 clips. Each clip ranges from 7 to 10 seconds long.

1 PAPER • NO BENCHMARKS YET

VFD-2000

VFD-2000 is a video fight detection dataset containing more than 2000 videos. YouTube is the data source. Specific scenarios are searched using “fight” as a search keyword, for example, “street fight”, “beach fight”, and “violence in the restaurant”. 200 videos under 20 different scenes are collected.

1 PAPER • NO BENCHMARKS YET

Win-Fail Action Understanding

First of its kind paired win-fail action understanding dataset with samples from the following domains: “General Stunts,” “Internet Wins-Fails,” “Trick Shots,” & “Party Games.” The task is to identify successful and failed attempts at various activities. Unlike existing action recognition datasets, intra-class variation is high making the task challenging, yet feasible.

1 PAPER • 2 BENCHMARKS

VIDIMU: Multimodal video and IMU kinematic dataset on daily life activities using affordable devices

VIDIMU: Multimodal video and IMU kinematic dataset on daily life activities using affordable devices (https://zenodo.org/record/8210563)

Human activity recognition and clinical biomechanics are challenging problems in physical telerehabilitation medicine. However, most publicly available datasets on human body movements cannot be used to study both problems in an out-of-the-lab movement acquisition setting. The objective of the VIDIMU dataset is to pave the way towards affordable patient tracking solutions for remote daily life activities recognition and kinematic analysis.

0 PAPER • NO BENCHMARKS YET

Datasets

75 dataset results for Action Recognition AND Videos