Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

arXiv 2024  ยท  Miroslav Purkrabek, Jiri Matas ยท

Human pose estimation methods work well on separated people but struggle with multi-body scenarios. Recent work has addressed this problem by conditioning pose estimation with detected bounding boxes or bottom-up-estimated poses. Unfortunately, all of these approaches overlooked segmentation masks and their connection to estimated keypoints. We condition pose estimation model by segmentation masks instead of bounding boxes to improve instance separation. This improves top-down pose estimation in multi-body scenarios but does not fix detection errors. Consequently, we develop BBox-Mask-Pose (BMP), integrating detection, segmentation and pose estimation into self-improving feedback loop. We adapt detector and pose estimation model for conditioning by instance masks and use Segment Anything as pose-to-mask model to close the circle. With only small models, BMP is superior to top-down methods on OCHuman dataset and to detector-free methods on COCO dataset, combining the best from both approaches and matching state of art performance in both settings. Code is available on https://mirapurkrabek.github.io/BBox-Mask-Pose.

PDF Abstract arXiv 2024 PDF

Datasets


Results from the Paper


 Ranked #1 on 2D Human Pose Estimation on OCHuman (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
2D Human Pose Estimation OCHuman BBox-Mask-Pose 2x Test AP 48.3 # 1
Validation AP 48.6 # 1
Human Instance Segmentation OCHuman BBox-Mask-Pose 2x AP 32.4 # 1
Human Instance Segmentation OCHuman RTMDet-ins-l AP 26.5 # 10
Pose Estimation OCHuman MaskPose-b Test AP 45.0 # 7
Validation AP 45.3 # 7
Pose Estimation OCHuman BBox-Mask-Pose 2x Test AP 48.3 # 4
Validation AP 48.6 # 4
Keypoint Detection OCHuman BBox-Mask-Pose 2x Test AP 48.3 # 1
Validation AP 48.6 # 1

Methods


No methods listed for this paper. Add relevant methods here