The Human3.6M dataset is one of the largest motion capture datasets, which consists of 3.6 million human poses and corresponding images captured by a high-speed motion capture system. There are 4 high-resolution progressive scan cameras to acquire video data at 50 Hz. The dataset contains activities by 11 professional actors in 17 scenarios: discussion, smoking, taking photo, talking on the phone, etc., as well as provides accurate 3D joint positions and high-resolution videos.
719 PAPERS • 16 BENCHMARKS
The 3D Poses in the Wild dataset is the first dataset in the wild with accurate 3D poses for evaluation. While other datasets outdoors exist, they are all restricted to a small recording volume. 3DPW is the first one that includes video footage taken from a moving phone camera.
340 PAPERS • 5 BENCHMARKS
The GTA Indoor Motion dataset (GTA-IM) that emphasizes human-scene interactions in the indoor environments. It consists of HD RGB-D image sequences of 3D human motion from a realistic game engine. The dataset has clean 3D human pose and camera pose annotations, and large diversity in human appearances, indoor environments, camera views, and human activities.
9 PAPERS • 2 BENCHMARKS
PATS dataset consists of a diverse and large amount of aligned pose, audio and transcripts. With this dataset, we hope to provide a benchmark that would help develop technologies for virtual agents which generate natural and relevant gestures.
9 PAPERS • NO BENCHMARKS YET