The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation
We introduce CenterGroup, an attention-based framework to estimate human poses from a set of identity-agnostic keypoints and person center predictions in an image. Our approach uses a transformer to obtain context-aware embeddings for all detected keypoints and centers and then applies multi-head attention to directly group joints into their corresponding person centers. While most bottom-up methods rely on non-learnable clustering at inference, CenterGroup uses a fully differentiable attention mechanism that we train end-to-end together with our keypoint detector. As a result, our method obtains state-of-the-art performance with up to 2.5x faster inference time than competing bottom-up methods. Our code is available at https://github.com/dvl-tum/center-group .
PDF Abstract ICCV 2021 PDF ICCV 2021 AbstractTask | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Multi-Person Pose Estimation | CrowdPose | CenterGroup | mAP @0.5:0.95 | 69.4 | # 11 | |
AP Easy | 76.6 | # 9 | ||||
AP Medium | 70.0 | # 10 | ||||
AP Hard | 61.5 | # 7 | ||||
Multi-Person Pose Estimation | MS COCO | CenterGroup | AP | 0.714 | # 6 | |
Test AP | 71.4 | # 3 |