Whole-Body Human Pose Estimation in the Wild

This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet. As existing datasets do not have whole-body annotations, previous methods have to assemble different deep models trained independently on different datasets of the human face, hand, and body, struggling with dataset biases and large model complexity. To fill in this blank, we introduce COCO-WholeBody which extends COCO dataset with whole-body annotations. To our best knowledge, it is the first benchmark that has manual annotations on the entire human body, including 133 dense landmarks with 68 on the face, 42 on hands and 23 on the body and feet. A single-network model, named ZoomNet, is devised to take into account the hierarchical structure of the full human body to solve the scale variation of different body parts of the same person. ZoomNet is able to significantly outperform existing methods on the proposed COCO-WholeBody dataset. Extensive experiments show that COCO-WholeBody not only can be used to train deep models from scratch for whole-body pose estimation but also can serve as a powerful pre-training dataset for many different tasks such as facial landmark detection and hand keypoint estimation. The dataset is publicly available at https://github.com/jin-s13/COCO-WholeBody.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract


Introduced in the Paper:


Used in the Paper:

COCO DensePose 300W
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
2D Human Pose Estimation COCO-WholeBody ZoomNet (V0.5 data) WB 54.1 # 6
body 74.3 # 2
foot 79.8 # 1
face 62.3 # 7
hand 40.1 # 7