The Breakfast Actions Dataset comprises of 10 actions related to breakfast preparation, performed by 52 different individuals in 18 different kitchens. The dataset is one of the largest fully annotated datasets available. The actions are recorded “in the wild” as opposed to a single controlled lab environment. It consists of over 77 hours of video recordings.
76 PAPERS • 2 BENCHMARKS
The JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) is a surgical activity dataset for human motion modeling. The data was collected through a collaboration between The Johns Hopkins University (JHU) and Intuitive Surgical, Inc. (Sunnyvale, CA. ISI) within an IRB-approved study. The release of this dataset has been approved by the Johns Hopkins University IRB. The dataset was captured using the da Vinci Surgical System from eight surgeons with different levels of skill performing five repetitions of three elementary surgical tasks on a bench-top model: suturing, knot-tying and needle-passing, which are standard components of most surgical skills training curricula. The JIGSAWS dataset consists of three components:
61 PAPERS • 2 BENCHMARKS
The Georgia Tech Egocentric Activities (GTEA) dataset contains seven types of daily activities such as making sandwich, tea, or coffee. Each activity is performed by four different people, thus totally 28 videos. For each video, there are about 20 fine-grained action instances such as take bread, pour ketchup, in approximately one minute.
56 PAPERS • 2 BENCHMARKS
Activity recognition research has shifted focus from distinguishing full-body motion patterns to recognizing complex interactions of multiple entities. Manipulative gestures – characterized by interactions between hands, tools, and manipulable objects – frequently occur in food preparation, manufacturing, and assembly tasks, and have a variety of applications including situational support, automated supervision, and skill assessment. With the aim to stimulate research on recognizing manipulative gestures we introduce the 50 Salads dataset. It captures 25 people preparing 2 mixed salads each and contains over 4h of annotated accelerometer and RGB-D video data. Including detailed annotations, multiple sensor types, and two sequences per participant, the 50 Salads dataset may be used for research in areas such as activity recognition, activity spotting, sequence analysis, progress tracking, sensor fusion, transfer learning, and user-adaptation.
12 PAPERS • 1 BENCHMARK
The Watch-n-Patch dataset was created with the focus on modeling human activities, comprising multiple actions in a completely unsupervised setting. It is collected with Microsoft Kinect One sensor for a total length of about 230 minutes, divided in 458 videos. 7 subjects perform human daily activities in 8 offices and 5 kitchens with complex backgrounds. Moreover, skeleton data are provided as ground truth annotations.
10 PAPERS • NO BENCHMARKS YET
The TUM Kitchen dataset is an action recognition dataset that contains 20 video sequences captured by 4 cameras with overlapping views. The camera network captures the scene from four viewpoints with 25 fps, and every RGB frame is of the resolution 384×288 by pixels. The action labels are frame-wise, and provided for the left arm, the right arm and the torso separately.
7 PAPERS • NO BENCHMARKS YET