In this work, we propose sampling-argmax, a differentiable training method that imposes implicit constraints to the shape of the probability map by minimizing the expectation of the localization error.
As an illustrative example, with ArtiBoost, even a simple baseline network can outperform the previous start-of-the-art based on Transformer on the HO3D dataset.
In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution.
Such spatial and attention features are nested deeply, therefore, the proposed framework works in a mixed top-down and bottom-up manner.
In this paper, we present an explicit contact representation namely Contact Potential Field (CPF), and a learning-fitting hybrid framework namely MIHO to Modeling the Interaction of Hand and Object.
We show that HybrIK preserves both the accuracy of 3D pose and the realistic body structure of the parametric human model, leading to a pixel-aligned 3D body mesh and a more accurate 3D pose than the pure 3D keypoint estimation methods.
Ranked #5 on 3D Human Pose Estimation on 3DPW
The HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically, which captures the body-part and joint level semantic and maintains global consistency at the same time.
In light of these, we propose a detailed 2D-3D joint representation learning method.
Ranked #1 on Human-Object Interaction Detection on Ambiguious-HOI
In this paper, we propose a novel and efficient method to tackle the problem of pose estimation in the crowd and a new dataset to better evaluate algorithms.
Ranked #4 on Multi-Person Pose Estimation on CrowdPose
Multi-person articulated pose tracking in unconstrained videos is an important while challenging problem.
Ranked #7 on Keypoint Detection on COCO test-challenge