DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach

CVPR 2024  ·  Dayi Tan, Hansheng Chen, Wei Tian, Lu Xiong ·

This paper presents the DiffusionRegPose a novel approach to multi-person pose estimation that converts a one-stage end-to-end keypoint regression model into a diffusion-based sampling process. Existing one-stage deterministic regression methods though efficient are often prone to missed or false detections in crowded or occluded scenes due to their inability to reason pose ambiguity. To address these challenges we handle ambiguous poses in a generative fashion i.e. sampling from the image-conditioned pose distributions characterized by a diffusion probabilistic model. Specifically with initial pose tokens extracted from the image noisy pose candidates are progressively refined by interacting with the initial tokens via attention layers. Extensive evaluations on the COCO and CrowdPose datasets show that DiffusionRegPose clearly improves the pose accuracy in crowded scenarios as evidenced by a notable 4.0 AP increase in the AP_H metric on the CrowdPose dataset. This demonstrates the model's potential for robust and precise human pose estimation in real-world applications.

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods