The Waymo Open Dataset is comprised of high resolution sensor data collected by autonomous vehicles operated by the Waymo Driver in a wide variety of conditions.
377 PAPERS • 12 BENCHMARKS
The 3D Poses in the Wild dataset is the first dataset in the wild with accurate 3D poses for evaluation. While other datasets outdoors exist, they are all restricted to a small recording volume. 3DPW is the first one that includes video footage taken from a moving phone camera.
340 PAPERS • 5 BENCHMARKS
AMASS is a large database of human motion unifying different optical marker-based motion capture datasets by representing them within a common framework and parameterization. AMASS is readily useful for animation, visualization, and generating training data for deep learning.
285 PAPERS • 1 BENCHMARK
MPI-INF-3DHP is a 3D human body pose estimation dataset consisting of both constrained indoor and complex outdoor scenes. It records 8 actors performing 8 activities from 14 camera views. It consists on >1.3M frames captured from the 14 cameras.
257 PAPERS • 6 BENCHMARKS
AGORA is a synthetic human dataset with high realism and accurate ground truth. It consists of around 14K training and 3K test images by rendering between 5 and 15 people per image using either image-based lighting or rendered 3D environments, taking care to make the images physically plausible and photoreal. In total, AGORA contains 173K individual person crops. AGORA provides (1) SMPL/SMPL-X parameters and (2) segmentation masks for each subject in images.
55 PAPERS • 4 BENCHMARKS
The TotalCapture dataset consists of 5 subjects performing several activities such as walking, acting, a range of motion sequence (ROM) and freestyle motions, which are recorded using 8 calibrated, static HD RGB cameras and 13 IMUs attached to head, sternum, waist, upper arms, lower arms, upper legs, lower legs and feet, however the IMU data is not required for our experiments. The dataset has publicly released foreground mattes and RGB images. Ground-truth poses are obtained using a marker-based motion capture system, with the markers are <5mm in size. All data is synchronised and operates at a framerate of 60Hz, providing ground truth poses as joint positions.
51 PAPERS • 2 BENCHMARKS
Inferring human-scene contact (HSC) is the first step toward understanding how humans interact with their surroundings. While detecting 2D human-object interaction (HOI) and reconstructing 3D human pose and shape (HPS) have enjoyed significant progress, reasoning about 3D human-scene contact from a single image is still challenging. Existing HSC detection methods consider only a few types of predefined contact, often reduce body and scene to a small number of primitives, and even overlook image evidence. To predict human-scene contact from a single image, we address the limitations above from both data and algorithmic perspectives. We capture a new dataset called RICH for “Real scenes, Interaction, Contact and Humans.” RICH contains multiview outdoor/indoor video sequences at 4K resolution, ground-truth 3D human bodies captured using markerless motion capture, 3D body scans, and high resolution 3D scene scans. A key feature of RICH is that it also contains accurate vertex-level contact
38 PAPERS • 1 BENCHMARK
HPS Dataset is a collection of 3D humans interacting with large 3D scenes (300-1000 $m^2$, up to 2500 $m^2$). The dataset contains images captured from a head-mounted camera coupled with the reference 3D pose and location of the person in a pre-scanned 3D scene. 7 people in 8 large scenes are captured performing activities such as exercising, reading, eating, lecturing, using a computer, making coffee, dancing. The dataset provides more than 300K synchronized RGB images coupled with the reference 3D pose and location.
18 PAPERS • NO BENCHMARKS YET
AIST++ is a 3D dance dataset which contains 3D motion reconstructed from real dancers paired with music. The AIST++ Dance Motion Dataset is constructed from the AIST Dance Video DB. With multi-view videos, an elaborate pipeline is designed to estimate the camera parameters, 3D human keypoints and 3D human dance motion sequences:
15 PAPERS • 2 BENCHMARKS
EMDB contains in-the-wild videos of human activity recorded with a hand-held iPhone. It features reference SMPL body pose and shape parameters, as well as global body root and camera trajectories. The reference 3D poses were obtained by jointly fitting SMPL to 12 body-worn electromagnetic sensors and image data. For the latter we fit a neural implicit avatar model to allow for a dense pixel-wise fitting objective.
SSP-3D is an evaluation dataset consisting of 311 images of sportspersons in tight-fitted clothes, with a variety of body shapes and poses. The images were collected from the Sports-1M dataset. SSP-3D is intended for use as a benchmark for body shape prediction methods. Pseudo-ground-truth 3D shape labels (using the SMPL body model) were obtained via multi-frame optimisation with shape consistency between frames, as described here.
14 PAPERS • 1 BENCHMARK
Dataset of clothing size variation which includes different subjects wearing casual clothing items in various sizes, totaling to approximately 2000 scans. This dataset includes the scans, registrations to the SMPL model, scans segmented in clothing parts, garment category and size labels.
10 PAPERS • NO BENCHMARKS YET
Unite The People is a dataset for 3D body estimation. The images come from the Leeds Sports Pose dataset and its extended version, as well as the single person tagged people from the MPII Human Pose Dataset. The images are labeled with different types of annotations such as segmentation labels, pose or 3D.
9 PAPERS • NO BENCHMARKS YET
4D-OR includes a total of 6734 scenes, recorded by six calibrated RGB-D Kinect sensors 1 mounted to the ceiling of the OR, with one frame-per-second, providing synchronized RGB and depth images. We provide fused point cloud sequences of entire scenes, automatically annotated human 6D poses and 3D bounding boxes for OR objects. Furthermore, we provide SSG annotations for each step of the surgery together with the clinical roles of all the humans in the scenes, e.g., nurse, head surgeon, anesthesiologist.
8 PAPERS • 1 BENCHMARK
Accurate 3D human pose estimation is essential for sports analytics, coaching, and injury prevention. However, existing datasets for monocular pose estimation do not adequately capture the challenging and dynamic nature of sports movements. In response, we introduce SportsPose, a large-scale 3D human pose dataset consisting of highly dynamic sports movements. With more than 176,000 3D poses from 24 different subjects performing 5 different sports activities, SportsPose provides a diverse and comprehensive set of 3D poses that reflect the complex and dynamic nature of sports movements. Contrary to other markerless datasets we have quantitatively evaluated the precision of SportsPose by comparing our poses with a commercial marker-based system and achieve a mean error of 34.5 mm across all evaluation sequences. This is comparable to the error reported on the commonly used 3DPW dataset. We further introduce a new metric, local movement, which describes the movement of the wrist and ankle
3 PAPERS • NO BENCHMARKS YET
InfiniteRep is a synthetic, open-source dataset for fitness and physical therapy (PT) applications. It includes 1k videos of diverse avatars performing multiple repetitions of common exercises. It includes significant variation in the environment, lighting conditions, avatar demographics, and movement trajectories. From cadence to kinematic trajectory, each rep is done slightly differently -- just like real humans. InfiniteRep videos are accompanied by a rich set of pixel-perfect labels and annotations, including frame-specific repetition counts.
0 PAPER • NO BENCHMARKS YET
Human activity recognition and clinical biomechanics are challenging problems in physical telerehabilitation medicine. However, most publicly available datasets on human body movements cannot be used to study both problems in an out-of-the-lab movement acquisition setting. The objective of the VIDIMU dataset is to pave the way towards affordable patient tracking solutions for remote daily life activities recognition and kinematic analysis.