BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations

20 Dec 2021  ·  Bruno Artacho, Andreas Savakis ·

We propose BAPose, a novel bottom-up approach that achieves state-of-the-art results for multi-person pose estimation. Our end-to-end trainable framework leverages a disentangled multi-scale waterfall architecture and incorporates adaptive convolutions to infer keypoints more precisely in crowded scenes with occlusions. The multi-scale representations, obtained by the disentangled waterfall module in BAPose, leverage the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Our results on the challenging COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust framework for multi-person pose estimation, achieving significant improvements on state-of-the-art accuracy.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Multi-Person Pose Estimation CrowdPose BAPose (W32) mAP @0.5:0.95 72.2 # 6
AP Easy 79.9 # 6
AP Medium 73.4 # 6
AP Hard 61.3 # 8
Multi-Person Pose Estimation MS COCO BAPose AP 0.727 # 5
Validation AP 72.7 # 3
Test AP 71.2 # 4

Methods


No methods listed for this paper. Add relevant methods here