Keypoint detection involves simultaneously detecting people and localizing their keypoints. Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. They are invariant to image rotation, shrinkage, translation, distortion, and so on.
( Image credit: PifPaf: Composite Fields for Human Pose Estimation; "Learning to surf" by fotologic, license: CC-BY-2.0 )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method.
Here we utilize the encoder-decoder structure in Transformers to perform regression-based person and keypoint detection that is general-purpose and requires less heuristic design compared with the existing approaches.
Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions.
More specifically, we design a Geometry-aware Association GNN that utilizes spatial information of the keypoints and learns local affinity from the global context.
Multi-frame human pose estimation in complicated situations is challenging.
First, based on our observation that the probability density of the output space is sparse, we introduce a spatial probability distribution to describe this sparsity and then use it to guide the learning of the adversarial regressor.
We present a generic neural network architecture that uses Composite Fields to detect and construct a spatio-temporal pose which is a single, connected graph whose nodes are the semantic keypoints (e. g., a person's body joints) in multiple frames.
Ranked #4 on Multi-Person Pose Estimation on COCO
Attention modules have been demonstrated effective in strengthening the representation ability of a neural network via reweighting spatial or channel features or stacking both operations sequentially.