In this work, we first show that the domain gap between the avatar and headset-camera images is one of the primary sources of difficulty, where a transformer-based architecture achieves high accuracy on domain-consistent data, but degrades when the domain-gap is re-introduced.
To circumvent this, we propose a novel volumetric avatar representation by extending mixtures of volumetric primitives to articulated objects.
To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.
The core intuition behind our method is that better drivability and generalization can be achieved by disentangling the driving signals and remaining generative factors, which are not available during animation.
Social presence, the feeling of being there with a real person, will fuel the next generation of communication systems driven by digital humans in virtual reality (VR).
Based on this refined kinematic pose, the policy learns to compute dynamics-based control (e. g., joint torques) of the character to advance the current-frame pose estimate to the pose estimate of the next frame.
Ranked #227 on 3D Human Pose Estimation on Human3.6M
End-to-end training is made possible by differentiable registration and 3D triangulation modules.
Cross-domain image-to-image translation should satisfy two requirements: (1) preserve the information that is common to both domains, and (2) generate convincing images covering variations that appear in the target domain.
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Ranked #4 on Pose Estimation on MPII Single Person
In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video.
Ranked #1 on Facial Landmark Detection on 300-VW (C)
We present an approach to efficiently detect the 2D pose of multiple people in an image.
Ranked #6 on Keypoint Detection on MPII Multi-Person
Pose Machines provide a sequential prediction framework for learning rich implicit spatial models.
Ranked #2 on Classification on RSSCN7