🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

113 dataset results for Pose Estimation

Unite The People is a dataset for 3D body estimation. The images come from the Leeds Sports Pose dataset and its extended version, as well as the single person tagged people from the MPII Human Pose Dataset. The images are labeled with different types of annotations such as segmentation labels, pose or 3D.

9 PAPERS • NO BENCHMARKS YET

CarFusion

We provide manual annotations of 14 semantic keypoints for 100,000 car instances (sedan, suv, bus, and truck) from 53,000 images captured from 18 moving cameras at Multiple intersections in Pittsburgh, PA. Please fill the google form to get a email with the download links:

8 PAPERS • 2 BENCHMARKS

HUMAN4D

HUMAN4D is a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and $2$ male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc. ), along with multi-RGBD (mRGBD), volumetric and audio data.

8 PAPERS • NO BENCHMARKS YET

UnrealEgo

UnrealEgo is a dataset that provides in-the-wild stereo images with a large variety of motions for 3D human pose estimation. The in-the-wild stereo images are stereo fisheye images and depth maps with a resolution of 1024×1024 pixels each with 25 frames per second and a total of 450k (900k images) are captured for the dataset. Metadata is provided for each frame, including 3D joint positions, camera positions, and 2D coordinates of reprojected joint positions in the fisheye views.

8 PAPERS • 1 BENCHMARK

EgoCap

EgoCap is a dataest of 100,000 egocentric images of eight people in different clothing, with 75,000 images from six people used for training. The images have been captured with two fisheye cameras.

7 PAPERS • NO BENCHMARKS YET

GPA (Geometric Pose Affordance)

multi-view imagery of people interacting with a variety of rich 3D environments

7 PAPERS • 2 BENCHMARKS

HandNet

The HandNet dataset contains depth images of 10 participants' hands non-rigidly deforming in front of a RealSense RGB-D camera. The annotations are generated by a magnetic annotation technique. 6D pose is available for the center of the hand as well as the five fingertips (i.e. position and orientation of each).

7 PAPERS • NO BENCHMARKS YET

ATRW (Amur Tiger Re-identification in the Wild)

The ATRW Dataset contains over 8,000 video clips from 92 Amur tigers, with bounding box, pose keypoint, and tiger identity annotations.

6 PAPERS • NO BENCHMARKS YET

Biwi Kinect Head Pose

Biwi Kinect Head Pose is a challenging dataset mainly inspired by the automotive setup. It is acquired with the Microsoft Kinect sensor, a structured IR light device. It contains about 15k frame, with RGB. (640 × 480) and depth maps (640 × 480). Twenty subjects have been involved in the recordings: four of them were recorded twice, for a total of 24 sequences. The ground truth of yaw, pitch and roll angles is reported together with the head center and the calibration matrix.

6 PAPERS • NO BENCHMARKS YET

FAT (Falling Things)

Falling Things (FAT) is a dataset for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics. It consists of generated photorealistic images with accurate 3D pose annotations for all objects in 60k images.

6 PAPERS • NO BENCHMARKS YET

MonoPerfCap Dataset

MonoPerfCap is a benchmark dataset for human 3D performance capture from monocular video input consisting of around 40k frames, which covers a variety of different scenarios.

6 PAPERS • NO BENCHMARKS YET

SLOPER4D

SLOPER4D is a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. It consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 2,000 (up to 13,000), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE.

6 PAPERS • 1 BENCHMARK

BRACE (The Breakdancing Competition Dataset for Dance Motion Synthesis)

BRACE is a dataset for audio-conditioned dance motion synthesis challenging common assumptions for this task:

5 PAPERS • 2 BENCHMARKS

DREAM-dataset

DREAM-dataset (Deep Robot-to-camera Extrinsics for Articulated Manipulators)

The DREAM dataset is introduce by the paper "Camera-to-Robot Pose Estimation from a Single Image" (ICRA 2020). This dataset consists of synthetic images (both with and without domain randomlization) of three different robot manipulators (Franka Emika’s Panda, Kuka’s LBR iiwa 7 R800, and Rethink Robotics’ Baxter) , as well as real-world images of Franka Emika’s Panda taken from various RGBD cameras (XBox 360 Kinect (XK), RealSense (RS), and Azure Kinect (AK)). Each instance in the dataset contains an RGB image, keypoint 3D/2D coordinates , global camera-to-robot transformation and joint state configurations (from both revolute and prismatic joint) of the robot. Tasks like estimating robot pose (camera pose) from a single RGB image, camera-to-robot calibration can be conducted and evaluated in this dataset.

5 PAPERS • 1 BENCHMARK

PedX

PedX is a large-scale multi-modal collection of pedestrians at complex urban intersections. The dataset provides high-resolution stereo images and LiDAR data with manual 2D and automatic 3D annotations. The data was captured using two pairs of stereo cameras and four Velodyne LiDAR sensors.

5 PAPERS • NO BENCHMARKS YET

SynthHands

The SynthHands dataset is a dataset for hand pose estimation which consists of real captured hand motion retargeted to a virtual hand with natural backgrounds and interactions with different objects. The dataset contains data for male and female hands, both with and without interaction with objects. While the hand and foreground object are synthtically generated using Unity, the motion was obtained from real performances as described in the accompanying paper. In addition, real object textures and background images (depth and color) were used. Ground truth 3D positions are provided for 21 keypoints of the hand.

5 PAPERS • NO BENCHMARKS YET

FewSOL (A Dataset for Few-Shot Object Learning in Robotic Environments)

The Few-Shot Object Learning (FewSOL) dataset can be used for object recognition with a few images per object. It contains 336 real-world objects with 9 RGB-D images per object from different views. Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. FewSOL dataset can be used to study a set of few-shot object recognition problems such as classification, detection and segmentation, shape reconstruction, pose estimation, keypoint correspondences and attribute recognition.

4 PAPERS • NO BENCHMARKS YET

Fraunhofer IPA Bin-Picking

The Fraunhofer IPA Bin-Picking dataset is a large-scale dataset comprising both simulated and real-world scenes for various objects (potentially having symmetries) and is fully annotated with 6D poses. A pyhsics simulation is used to create scenes of many parts in bulk by dropping objects in a random position and orientation above a bin. Additionally, this dataset extends the Siléane dataset by providing more samples. This allows to e.g. train deep neural networks and benchmark the performance on the public Siléane dataset

4 PAPERS • NO BENCHMARKS YET

IMUPoser

The IMUPoser Dataset is a dataset for estimating body pose using IMUs already in devices that many users own -- namely smartphones, smartwatches, and earbuds.

4 PAPERS • NO BENCHMARKS YET

YCBInEOAT Dataset

A new dataset with significant occlusions related to object manipulation.

4 PAPERS • NO BENCHMARKS YET

Yoga-82

Dataset for large-scale yoga pose recognition with 82 classes.

4 PAPERS • NO BENCHMARKS YET

Composable activities dataset

The Composable activities dataset consists of 693 videos that contain activities in 16 classes performed by 14 actors. Each activity is composed of 3 to 11 atomic actions. RGB-D data for each sequence is captured using a Microsoft Kinect sensor and estimate position of relevant body joints.

3 PAPERS • NO BENCHMARKS YET

H3WB (Human 3.6M 3D WholeBody)

Human3.6M 3D WholeBody (H3WB) is a large scale dataset with 133 whole-body keypoint annotations on 100K images, made possible by a new multi-view pipeline. It is designed for the three new tasks : i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, iii) 3D whole-body pose estimation from a single RGB image.

3 PAPERS • 3 BENCHMARKS

HOPE-Image (Household Objects for Pose Estimation)

The NVIDIA HOPE datasets consist of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. 3D textured meshes for generating synthetic training data are provided.

3 PAPERS • NO BENCHMARKS YET

ICVL Hand Posture (ICVL Hand Posture Dataset)

The ICVL dataset is a hand pose estimation dataset that consists of 330K training frames and 2 testing sequences with each 800 frames. The dataset is collected from 10 different subjects with 16 hand joint annotations for each frame.

3 PAPERS • NO BENCHMARKS YET

PoPArt

PoPArt (Poses of People in Art: A Data Set for Human Pose Estimation in Digital Art History)

Throughout the history of art, the pose—as the holistic abstraction of the human body's expression—has proven to be a constant in numerous studies. However, due to the enormous amount of data that so far had to be processed by hand, its crucial role to the formulaic recapitulation of art-historical motifs since antiquity could only be highlighted selectively. This is true even for the now automated estimation of human poses, as domain-specific, sufficiently large data sets required for training computational models are either not publicly available or not indexed at a fine enough granularity. With the Poses of People in Art data set, we introduce the first openly licensed data set for estimating human poses in art and validating human pose estimators. It consists of 2,454 images from 22 art-historical depiction styles, including those that have increasingly turned away from lifelike representations of the body since the 19th century. A total of 10,749 human figures are precisely enclos

3 PAPERS • 1 BENCHMARK

SportsPose

SportsPose (SportsPose - A Dynamic 3D sports pose dataset)

Accurate 3D human pose estimation is essential for sports analytics, coaching, and injury prevention. However, existing datasets for monocular pose estimation do not adequately capture the challenging and dynamic nature of sports movements. In response, we introduce SportsPose, a large-scale 3D human pose dataset consisting of highly dynamic sports movements. With more than 176,000 3D poses from 24 different subjects performing 5 different sports activities, SportsPose provides a diverse and comprehensive set of 3D poses that reflect the complex and dynamic nature of sports movements. Contrary to other markerless datasets we have quantitatively evaluated the precision of SportsPose by comparing our poses with a commercial marker-based system and achieve a mean error of 34.5 mm across all evaluation sequences. This is comparable to the error reported on the commonly used 3DPW dataset. We further introduce a new metric, local movement, which describes the movement of the wrist and ankle

3 PAPERS • NO BENCHMARKS YET

CHAIRS dataset

CHAIRS is a large-scale motion-captured f-AHOI dataset, consisting of 17.3 hours of versatile interactions between 46 participants and 81 articulated and rigid sittable objects. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process, as well as realistic and physically plausible full-body interactions.

2 PAPERS • NO BENCHMARKS YET

CORSMAL

CORSMAL is a dataset for estimating the position and orientation in 3D (or 6D pose) of an object from a single view. The dataset consists of 138,240 images of rendered hands and forearms holding 48 synthetic objects, split into 3 grasp categories over 30 real backgrounds.

2 PAPERS • NO BENCHMARKS YET

Fitness-AQA (Fitness Action Quality Assessment [ECCV 2022])

Largest, first-of-its-kind, in-the-wild, fine-grained workout/exercise posture analysis dataset, covering three different exercises: BackSquat, Barbell Row, and Overhead Press. Seven different types of exercise errors are covered. Unlabeled data is also provided to facilitate self-supervised learning.

2 PAPERS • NO BENCHMARKS YET

HOPE-Video (Household Objects for Pose Estimation)

The HOPE-Video dataset contains 10 video sequences (2038 frames) with 5-20 objects on a tabletop scene captured by a robot arm-mounted RealSense D415 RGBD camera. In each sequence, the camera is moved to capture multiple views of a set of objects in the robotic workspace. First COLMAP was applied to refine the camera poses (keyframes at 6~fps) provided by forward kinematics and RGB calibration from RealSense to Baxter's wrist camera. 3D dense point cloud was then generated via CascadeStereo (included for each sequence in 'scene.ply'). Ground truth poses for the HOPE objects models in the world coordinate system were annotated manually using the CascadeStereo point clouds. The following are provided for each frame:

2 PAPERS • NO BENCHMARKS YET

MBW - Zoo Dataset

Dataset page: https://github.com/mosamdabhi/MBW-Data

2 PAPERS • NO BENCHMARKS YET

MERL-RAV (MERL Reannotation of AFLW with Visibility)

The MERL-RAV (MERL Reannotation of AFLW with Visibility) Dataset contains over 19,000 face images in a full range of head poses. Each face is manually labeled with the ground-truth locations of 68 landmarks, with the additional information of whether each landmark is unoccluded, self-occluded (due to extreme head poses), or externally occluded. The images were annotated by professional labelers, supervised by researchers at Mitsubishi Electric Research Laboratories (MERL).

2 PAPERS • 2 BENCHMARKS

MOTFront

MOTFront provides photo-realistic RGB-D images with their corresponding instance segmentation masks, class labels, 2D & 3D bounding boxes, 3D geometry, 3D poses and camera parameters. The MOTFront dataset comprises 2,381 unique indoor sequences with a total of 60,000 images and is based on the 3D-FRONT dataset.

2 PAPERS • NO BENCHMARKS YET

Parkinson's Pose Estimation Dataset

The data includes all movement trajectories extracted from the videos of Parkinson's assessments using Convolutional Pose Machines (CPM) as well as the confidence values from CPM. The dataset also includes ground truth ratings of parkinsonism and dyskinesia severity using the UDysRS, UPDRS, and CAPSIT.

2 PAPERS • NO BENCHMARKS YET

Poser

The Poser dataset is a dataset for pose estimation which consists of 1927 training and 418 test images. These images are synthetically generated and tuned to unimodal predictions. The images were generated using the Poser software package.

2 PAPERS • NO BENCHMARKS YET

Rendered Handpose Dataset

Rendered Handpose Dataset contains 41258 training and 2728 testing samples. Each sample provides:

2 PAPERS • NO BENCHMARKS YET

Retinal Microsurgery

The Retinal Microsurgery dataset is a dataset for surgical instrument tracking. It consists of 18 in-vivo sequences, each with 200 frames of resolution 1920 × 1080 pixels. The dataset is further classified into four instrument-dependent subsets. The annotated tool joints are n=3 and semantic classes c=2 (tool and background).

2 PAPERS • NO BENCHMARKS YET

UBC3V Dataset

~6 million synthetic depth frames for pose estimation from multiple cameras.

2 PAPERS • NO BENCHMARKS YET

Amateur Drawings

Amateur Drawings is a dataset collected via the public demo of Animated Drawings, containing over 178,000 amateur drawings and corresponding user-accepted character bounding boxes, segmentation masks, and joint location annotations.

1 PAPER • NO BENCHMARKS YET

BigHand2.2M Benchmark

A large-scale hand pose dataset, collected using a novel capture method.

1 PAPER • NO BENCHMARKS YET

CIP (Complete Inertial Pose)

The CIP dataset is composed of 2 subsets, containing low-cost (MPU9250) and high-end (MTwAwinda) Magnetic, Angular Rate, and Gravity (MARG) sensor data respectively. It provides data for the analysis of the complete inertial pose pipeline, from raw measurements, to sensor-to-segment calibration, multi-sensor fusion, skeleton kinematics, to the complete human pose. Multiple trials were collected with 21 and 10 subjects respectively, performing 6 types of movements (ranging from calibration, to daily-activities, range-of-motion and random). It presents a high degree of variability and complex dynamics while containing common sources of error found on real conditions. This amounts to 3.5M samples, synchronized with a ground-truth inertial motion capture system (Xsens) at 60hz. This dataset may contribute to assess, benchmark and develop novel algorithms for each of the pipelines' processing steps, with applications in classic or data-driven inertial pose estimation algorithms, human movem

1 PAPER • NO BENCHMARKS YET

DensePose-Track

DensePose-Track is a dataset of videos where selected frames are annotated in the traditional DensePose manner.

1 PAPER • NO BENCHMARKS YET

Desert Locust

Desert Locus is a animal pose estimation dataset for desert locuses.

1 PAPER • 1 BENCHMARK

Drunkard's Dataset

Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. To tackle this issue with a common benchmark, we introduce the Drunkard’s Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality.

1 PAPER • 1 BENCHMARK

Fine-grained 3D Pose

A new large-scale dataset that consists of 409 fine-grained categories and 31,881 images with accurate 3D pose annotation.

1 PAPER • NO BENCHMARKS YET

Halpe-FullBody

Halpe-FullBody is a full body keypoints dataset where each person has annotated 136 keypoints, including 20 for body, 6 for feet, 42 for hands and 68 for face. It is designed for the task of whole body human pose estimation.

1 PAPER • NO BENCHMARKS YET

HuPR (Human Pose with Millimeter Wave Radar)

HuPR is a human pose estimation benchmark is created using cross-calibrated mmWave radar sensors and a monocular RGB camera for cross-modality training of radar-based human pose estimation. This dataset contains 235 sequences of data in an indoor environment, with each sequence being one-minute long and totalling about 4 hour-long video data.

1 PAPER • NO BENCHMARKS YET

Datasets

113 dataset results for Pose Estimation