Since humans interact with diverse objects every day, the holistic 3D capture of these interactions is important to understand and model human behaviour.
We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work.
This design combines the strengths of SLAM and motion priors, which leads to significant improvements in human and camera motion estimation.
Through a hierarchical framework, we first learn skill priors for both body and hand movements in a decoupled setting.
We further qualitatively evaluate the effectiveness of our method on real images and demonstrate its generalizability towards interaction types and object categories.
We introduce TempCLR, a new time-coherent contrastive learning approach for the structured regression task of 3D hand reconstruction.
In part this is because there exist no datasets with ground-truth 3D annotations for the study of physically consistent and synchronised motion of hands and articulated objects.
In fact, we demonstrate that these human-scene interactions (HSIs) can be leveraged to improve the 3D reconstruction of a scene from a monocular RGB video.
We introduce the dynamic grasp synthesis task: given an object with a known 6D pose and a grasp reference, our goal is to generate motions that move the object to a target 6D pose.
For Minimally-Clothed regions, we define the DSR-MC loss, which encourages a tight match between a rendered SMPL body and the minimally-clothed regions of the image.
Ranked #43 on 3D Human Pose Estimation on 3DPW (using extra training data)
We then train a novel network that concatenates the camera calibration to the image features and uses these together to regress 3D body shape and pose.
Ranked #1 on 3D Multi-Person Pose Estimation on AGORA
In natural conversation and interaction, our hands often overlap or are in contact with each other.
Ranked #5 on 3D Interacting Hand Pose Estimation on InterHand2.6M
Despite significant progress, we show that state of the art 3D human pose and shape estimation methods remain sensitive to partial occlusion and can produce dramatically wrong predictions although much of the body is observable.
Ranked #2 on 3D Multi-Person Pose Estimation on AGORA
In this work, we propose a new training regularizer that aims to minimize the probabilistic expected training loss of a DNN subject to a generic Gaussian input.
Human motion is fundamental to understanding behavior.
Ranked #32 on Monocular 3D Human Pose Estimation on Human3.6M
Training accurate 3D human pose estimators requires large amount of 3D ground-truth data which is costly to collect.
Ranked #1 on Weakly-supervised 3D Human Pose Estimation on Human3.6M (Number of Frames Per View metric)
In this paper, we present MultiPoseNet, a novel bottom-up multi-person pose estimation architecture that combines a multi-task model with a novel assignment method.
Ranked #8 on Multi-Person Pose Estimation on COCO