UniPose: Detecting Any Keypoints

12 Oct 2023  ·  Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang ·

This work proposes a unified framework called UniPose to detect keypoints of any articulated (e.g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation. Keypoint is a structure-aware, pixel-level, and compact representation of any object, especially articulated objects. Existing fine-grained promptable tasks mainly focus on object instance detection and segmentation but often fail to identify fine-grained granularity and structured information of image and instance, such as eyes, leg, paw, etc. Meanwhile, prompt-based keypoint detection is still under-explored. To bridge the gap, we make the first attempt to develop an end-to-end prompt-based keypoint detection framework called UniPose to detect keypoints of any objects. As keypoint detection tasks are unified in this framework, we can leverage 13 keypoint detection datasets with 338 keypoints across 1,237 categories over 400K instances to train a generic keypoint detection model. UniPose can effectively align text-to-keypoint and image-to-keypoint due to the mutual enhancement of textual and visual prompts based on the cross-modality contrastive learning optimization objectives. Our experimental results show that UniPose has strong fine-grained localization and generalization abilities across image styles, categories, and poses. Based on UniPose as a generalist keypoint detector, we hope it could serve fine-grained visual perception, understanding, and generation.

PDF Abstract

Results from the Paper

 Ranked #1 on 2D Human Pose Estimation on Human-Art (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
2D Pose Estimation 300W UniPose Mean PCK@0.2 99.4 # 1
2D Pose Estimation Animal Kingdom UniPose Mean PCK@0.2 96.1 # 1
PCK@0.05 71.5 # 1
Animal Pose Estimation AP-10K UniPose AP 79.2 # 4
2D Pose Estimation Desert Locust UniPose Mean PCK@0.2 99.9 # 1
2D Human Pose Estimation Human-Art UniPose AP 0.759 # 1
2D Pose Estimation MacaquePose UniPose AP 79.4 # 1
Multi-Person Pose Estimation MS COCO UniPose AP 0.768 # 3
2D Pose Estimation Vinegar Fly UniPose Mean PCK@0.2 99.9 # 1