Marker-less monocular 3D human motion capture (MoCap) with scene interactions is a challenging research topic relevant for extended reality, robotics and virtual avatar generation.
We evaluate GraviCap on a new dataset with ground-truth annotations for persons and different objects undergoing free flights.
To address the limitations of the existing methods, we develop HandVoxNet++, i. e., a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner.
The problem of simultaneous rigid alignment of multiple unordered point sets which is unbiased towards any of the inputs has recently attracted increasing interest, and several reliable methods have been newly proposed.
We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios.
We, therefore, present PhysCap, the first algorithm for physically plausible, real-time and marker-less human 3D motion capture with a single colour camera at 25 fps.
The input to our method is a 3D voxelized depth map, and we rely on two hand shape representations.
We introduce a supervised-learning framework for non-rigid point set alignment of a new kind - Displacements on Voxels Networks (DispVoxNets) - which abstracts away from the point set representation and regresses 3D displacement fields on regularly sampled proxy 3D voxel grids.
The majority of the existing methods for non-rigid 3D surface regression from monocular 2D images require an object template or point tracks over multiple frames as an input, and are still far from real-time processing rates.
Monocular dense 3D reconstruction of deformable objects is a hard ill-posed problem in computer vision.