The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
5,161 PAPERS • 66 BENCHMARKS
The MPII Human Pose Dataset for single person pose estimation is composed of about 25K images of which 15K are training samples, 3K are validation samples and 7K are testing samples (which labels are withheld by the authors). The images are taken from YouTube videos covering 410 different human activities and the poses are manually annotated with up to 16 body joints.
355 PAPERS • 4 BENCHMARKS
The PoseTrack dataset is a large-scale benchmark for multi-person pose estimation and tracking in videos. It requires not only pose estimation in single frames, but also temporal tracking across frames. It contains 514 videos including 66,374 frames in total, split into 300, 50 and 208 videos for training, validation and test set respectively. For training videos, 30 frames from the center are annotated. For validation and test videos, besides 30 frames from the center, every fourth frame is also annotated for evaluating long range articulated tracking. The annotations include 15 body keypoints location, a unique person id and a head bounding box for each person instance.
57 PAPERS • 5 BENCHMARKS
The CrowdPose dataset contains about 20,000 images and a total of 80,000 human poses with 14 labeled keypoints. The test set includes 8,000 images. The crowded images containing homes are extracted from MSCOCO, MPII and AI Challenger.
30 PAPERS • 1 BENCHMARK