Rethinking pose estimation in crowds: overcoming the detection information-bottleneck and ambiguity

13 Jun 2023  ยท  Mu Zhou, Lucas Stoffl, Mackenzie Weygandt Mathis, Alexander Mathis ยท

Frequent interactions between individuals are a fundamental challenge for pose estimation algorithms. Current pipelines either use an object detector together with a pose estimator (top-down approach), or localize all body parts first and then link them to predict the pose of individuals (bottom-up). Yet, when individuals closely interact, top-down methods are ill-defined due to overlapping individuals, and bottom-up methods often falsely infer connections to distant bodyparts. Thus, we propose a novel pipeline called bottom-up conditioned top-down pose estimation (BUCTD) that combines the strengths of bottom-up and top-down methods. Specifically, we propose to use a bottom-up model as the detector, which in addition to an estimated bounding box provides a pose proposal that is fed as condition to an attention-based top-down model. We demonstrate the performance and efficiency of our approach on animal and human pose estimation benchmarks. On CrowdPose and OCHuman, we outperform previous state-of-the-art models by a significant margin. We achieve 78.5 AP on CrowdPose and 48.5 AP on OCHuman, an improvement of 8.6% and 7.8% over the prior art, respectively. Furthermore, we show that our method strongly improves the performance on multi-animal benchmarks involving fish and monkeys. The code is available at https://github.com/amathislab/BUCTD

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Multi-Person Pose Estimation CrowdPose BUCTD-W48 (w/cond. input from PETR, and generative sampling) mAP @0.5:0.95 78.5 # 2
AP Easy 83.9 # 2
AP Medium 79.0 # 2
AP Hard 72.3 # 2
Pose Estimation CrowdPose BUCTD-W48 (w/cond. input from PETR, and generative sampling) AP 78.5 # 1
AP Hard 72.3 # 1
AP Easy 83.9 # 1
AP Medium 79.0 # 2
Pose Estimation CrowdPose BUCTD-W48 AP 72.9 # 6
Pose Estimation CrowdPose BUCTD-W48 (w/cond. input from PETR) AP 76.7 # 3
Animal Pose Estimation Fish-100 BUCTD-preNet-W48 (DLCRNet) mAP 88.7 # 2
Animal Pose Estimation Fish-100 BUCTD-preNet-W48 (CID-W32) mAP 88.0 # 3
Animal Pose Estimation Fish-100 HRNet-W48 + Faster R-CNN mAP 89.1 # 1
Animal Pose Estimation Marmoset-8K BUCTD-preNet-W48 (CID-W32) mAP 93.3 # 1
Animal Pose Estimation Marmoset-8K BUCTD-CoAM-W48 (DLCRNet) mAP 91.6 # 3
Animal Pose Estimation Marmoset-8K CID-W32 mAP 92.5 # 2
Pose Estimation MS COCO BUCTD (PETR, with generative sampling) APM 74.2 # 2
APL 83.7 # 3
Pose Estimation MS COCO BUCTD (PETR, with generative sampling) AP 77.8 # 4
Pose Estimation OCHuman BUCTD (CID-W32) Test AP 47.2 # 5
Validation AP 47.7 # 5
Animal Pose Estimation TriMouse-161 BUCTD-CoAM-W48 (DLCRNet) mAP 99.1 # 1
Animal Pose Estimation TriMouse-161 DLCRNet mAP 95.8 # 3
Animal Pose Estimation TriMouse-161 CID-W32 mAP 86.8 # 6

Methods


No methods listed for this paper. Add relevant methods here