We present a robust learning algorithm to detect and handle collisions in 3D deforming meshes.
Most existing monocular 3D pose estimation approaches only focus on a single body part, neglecting the fact that the essential nuance of human motion is conveyed through a concert of subtle movements of face, hands, and body.
The core intuition behind our method is that better drivability and generalization can be achieved by disentangling the driving signals and remaining generative factors, which are not available during animation.
Therefore, we firstly propose (1) a large-scale dataset, InterHand2. 6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image.
To construct FrankMocap, we build the state-of-the-art monocular 3D "hand" motion capture method by taking the hand part of the whole body parametric model (SMPL-X).
We design our system to be trained in an end-to-end and weakly-supervised manner; therefore, it does not require groundtruth meshes.
In this paper, we propose a self-supervised domain adaptation approach to enable the animation of high-fidelity face models from a commodity camera.