AGORA is a synthetic human dataset with high realism and accurate ground truth. It consists of around 14K training and 3K test images by rendering between 5 and 15 people per image using either image-based lighting or rendered 3D environments, taking care to make the images physically plausible and photoreal. In total, AGORA contains 173K individual person crops. AGORA provides (1) SMPL/SMPL-X parameters and (2) segmentation masks for each subject in images.
61 PAPERS • 4 BENCHMARKS
BEHAVE is a full body human-object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them. Dataset contains ~15k frames at 5 locations with 8 subjects performing a wide range of interactions with 20 common objects.
43 PAPERS • 3 BENCHMARKS
Dynamic FAUST extends the FAUST dataset to dynamic 4D data. It consists of high-resolution 4D scans of human subjects in motion, captured at 60 fps.
28 PAPERS • 1 BENCHMARK
4D-DRESS is the first real-world 4D dataset of human clothing, capturing 64 human outfits in more than 520 motion sequences. These sequences include a) high-quality 4D textured scans; for each scan, we annotate b) vertex-level semantic labels, thereby obtaining c) the corresponding garment meshes and fitted SMPL(-X) body meshes. Totally, 4D-DRESS captures dynamic motions of 4 dresses, 28 lower, 30 upper, and 32 outer garments. For each garment, we also provide its canonical template mesh to benefit the future human clothing study.
22 PAPERS • 4 BENCHMARKS
SSP-3D is an evaluation dataset consisting of 311 images of sportspersons in tight-fitted clothes, with a variety of body shapes and poses. The images were collected from the Sports-1M dataset. SSP-3D is intended for use as a benchmark for body shape prediction methods. Pseudo-ground-truth 3D shape labels (using the SMPL body model) were obtained via multi-frame optimisation with shape consistency between frames, as described here.
18 PAPERS • 1 BENCHMARK
CustomHumans is recorded by a multi-view photogrammetry system equipped with 53 RGB (12 Megapixels) and 53 (4 Megapixels) IR cameras. The resulting high-quality scan is composed of a 40K-face mesh alongside a 4K-resolution texture map. In addition to the high-quality scans, CustomHumans provides accurately registered SMPL-X parameters using a customized mesh registration pipeline. 80 participants are invited for the data capturing. Each of them is instructed to perform several movements, such as "T-pose", "Hands Up'", "Squat'", "Turing head'', and "Hand gestures", in a 10-second long sequence (300 frames). 4-5 best-quality meshes in each sequence are selected as the data samples. In total, the dataset contains more than 600 high-quality scans with 120 different garments.
13 PAPERS • 1 BENCHMARK
Contains 60 female and 30 male actors performing a collection of 20 predefined everyday actions and sports movements, and one self-chosen movement.
11 PAPERS • 1 BENCHMARK
Dataset of clothing size variation which includes different subjects wearing casual clothing items in various sizes, totaling to approximately 2000 scans. This dataset includes the scans, registrations to the SMPL model, scans segmented in clothing parts, garment category and size labels.
11 PAPERS • NO BENCHMARKS YET
The CAPE dataset is a 3D dynamic dataset of clothed humans, featuring:
9 PAPERS • 1 BENCHMARK
The MMBody dataset provides human body data with motion capture, GT mesh, Kinect RGBD, and millimeter wave sensor data. See homepage for more details.
8 PAPERS • NO BENCHMARKS YET
X-Humans consists of 20 subjects (11 males, 9 females) with various clothing types and hair style. The collection of this dataset has been approved by an internal ethics committee. For each subject, we split the motion sequences into a training and test set. In total, there are 29,036 poses for training and 6,439 test poses. X-Humans also contains ground-truth SMPL-X parameters, obtained via a custom SMPL-X registration pipeline specifically designed to deal with low-resolution body parts.
7 PAPERS • NO BENCHMARKS YET
Human Bodies in the Wild (HBW) is a validation and test set for body shape estimation. It consists of images taken in the wild and ground truth 3D body scans in SMPL-X topology. To create HBW, we collect body scans of 35 participants and register the SMPL-X model to the scans. Further each participant is photographed in various outfits and poses in front of a white background and uploads full-body photos of themselves taken in the wild. The validation and test set images are released. The ground truth shape is only released for the validation set.
6 PAPERS • NO BENCHMARKS YET
We learn high fidelity human depths by leveraging a collection of social media dance videos scraped from the TikTok mobile social networking application. It is by far one of the most popular video sharing applications across generations, which include short videos (10-15 seconds) of diverse dance challenges as shown above. We manually find more than 300 dance videos that capture a single person performing dance moves from TikTok dance challenge compilations for each month, variety, type of dances, which are moderate movements that do not generate excessive motion blur. For each video, we extract RGB images at 30 frame per second, resulting in more than 100K images. We segmented these images using Removebg application, and computed the UV coordinates from DensePose.
A synthetic dataset for evaluating non-rigid 3D human reconstruction based on conventional RGB-D cameras. The dataset consist of seven motion sequences of a single human model.
1 PAPER • NO BENCHMARKS YET
The volumetric representation of human interactions is one of the fundamental domains in the development of immersive media productions and telecommunication applications. Particularly in the context of the rapid advancement of Extended Reality (XR) applications, this volumetric data has proven to be an essential technology for future XR elaboration. In this work, we present a new multimodal database to help advance the development of immersive technologies. Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs). The recording system consists of a volumetric capture (VoCap) studio, including 31 synchronized modules with 62 RGB cameras and 31 depth cameras. In addition to textured meshes, point clouds, and multi-view RGB-D data, we use one Lytro Illum camera for providing light field (LF) data simul
0 PAPER • NO BENCHMARKS YET