We introduce the first dense neural non-rigid structure from motion (N-NRSfM) approach, which can be trained end-to-end in an unsupervised manner from 2D point tracks.
3D hand reconstruction from images is a widely-studied problem in computer vision and graphics, and has a particularly high relevance for virtual and augmented reality.
Marker-less monocular 3D human motion capture (MoCap) with scene interactions is a challenging research topic relevant for extended reality, robotics and virtual avatar generation.
Motion segmentation is a challenging problem that seeks to identify independent motions in two or several input images.
We present a hybrid classical-quantum framework based on the Frank-Wolfe algorithm, Q-FW, for solving quadratic, linearly-constrained, binary optimization problems on quantum annealers (QA).
In contrast to previous works, this paper proposes a new SfT approach explaining 2D observations through physical simulations accounting for forces and material properties.
We present Playable Environments - a new representation for interactive video generation and manipulation in space and time.
Photorealistic editing of outdoor scenes from photographs requires a profound understanding of the image formation process and an accurate estimation of the scene geometry, reflectance and illumination.
1 code implementation • 10 Nov 2021 • Ayush Tewari, Justus Thies, Ben Mildenhall, Pratul Srinivasan, Edgar Tretschk, Yifan Wang, Christoph Lassner, Vincent Sitzmann, Ricardo Martin-Brualla, Stephen Lombardi, Tomas Simon, Christian Theobalt, Matthias Niessner, Jonathan T. Barron, Gordon Wetzstein, Michael Zollhoefer, Vladislav Golyanik
The reconstruction of such a scene representation from observations using differentiable rendering losses is known as inverse graphics or inverse rendering.
We evaluate GraviCap on a new dataset with ground-truth annotations for persons and different objects undergoing free flights.
In this work, we address such problems with emerging quantum computing technology and propose several reformulations of QAPs as unconstrained problems suitable for efficient execution on quantum hardware.
Even holding a mobile phone camera in the front of the face while sitting for a long duration is not convenient.
To address the limitations of the existing methods, we develop HandVoxNet++, i. e., a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner.
The problem of simultaneous rigid alignment of multiple unordered point sets which is unbiased towards any of the inputs has recently attracted increasing interest, and several reliable methods have been newly proposed.
Finding shape correspondences can be formulated as an NP-hard quadratic assignment problem (QAP) that becomes infeasible for shapes with high sampling density.
We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios.
This paper introduces the first differentiable simulator of event streams, i. e., streams of asynchronous brightness change signals recorded by event cameras.
We address these limitations and present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style.
Photo-realistic re-rendering of a human from a single image with explicit control over body pose, shape and appearance enables a wide range of applications, such as human appearance transfer, virtual try-on, motion imitation, and novel view synthesis.
Human re-rendering from a single image is a starkly under-constrained problem, and state-of-the-art algorithms often exhibit undesired artefacts, such as over-smoothing, unrealistic distortions of the body parts and garments, or implausible changes of the texture.
We show that a single handheld consumer-grade camera is sufficient to synthesize sophisticated renderings of a dynamic scene from novel virtual camera views, e. g. a `bullet-time' video effect.
We address these limitations for the first time in the literature and present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations, for several types of loose garments.
Due to the different data modality of event cameras compared to classical cameras, existing methods cannot be directly applied to and re-trained for event streams.
We present a new pose transfer method for synthesizing a human animation from a single image of a person controlled by a sequence of body poses.
This article introduces a new physics-based method for rigid point set alignment called Fast Gravitational Approach (FGA).
We, therefore, present PhysCap, the first algorithm for physically plausible, real-time and marker-less human 3D motion capture with a single colour camera at 25 fps.
At the level of patches, objects across different categories share similarities, which leads to more generalizable models.
The input to our method is a 3D voxelized depth map, and we rely on two hand shape representations.
This work describes Barnes-Hut Rigid Gravitational Approach (BH-RGA) -- a new rigid point set registration method relying on principles of particle dynamics.
The reasons for the slow dissemination are the severe ill-posedness, high sensitivity to motion and deformation cues and the difficulty to obtain reliable point tracks in the vast majority of practical scenarios.
We introduce a supervised-learning framework for non-rigid point set alignment of a new kind - Displacements on Voxels Networks (DispVoxNets) - which abstracts away from the point set representation and regresses 3D displacement fields on regularly sampled proxy 3D voxel grids.
Mesh autoencoders are commonly used for dimensionality reduction, sampling and mesh modeling.
SfAM is highly robust to noisy 2D annotations, generalizes to arbitrary objects and does not rely on training data, which is shown in extensive experiments on public benchmarks and real video sequences.
The majority of the existing methods for non-rigid 3D surface regression from monocular 2D images require an object template or point tracks over multiple frames as an input, and are still far from real-time processing rates.
Monocular dense 3D reconstruction of deformable objects is a hard ill-posed problem in computer vision.
We introduce a novel multiframe scene flow approach that jointly optimizes the consistency of the patch appearances and their local rigid motions from RGB-D image sequences.