Implicit neural representations of 3D shapes form strong priors that are useful for various applications, such as single and multiple view 3D reconstruction.
Our model can disentangle the geometry and appearance variations in the scene, i. e., we can independently sample from the geometry and appearance spaces of the generative model.
no code implementations • 10 Nov 2021 • Ayush Tewari, Justus Thies, Ben Mildenhall, Pratul Srinivasan, Edgar Tretschk, Yifan Wang, Christoph Lassner, Vincent Sitzmann, Ricardo Martin-Brualla, Stephen Lombardi, Tomas Simon, Christian Theobalt, Matthias Niessner, Jonathan T. Barron, Gordon Wetzstein, Michael Zollhoefer, Vladislav Golyanik
The reconstruction of such a scene representation from observations using differentiable rendering losses is known as inverse graphics or inverse rendering.
We demonstrate that it suffices to train our temporal architecture on only 10 minutes of footage of 1 subject for about 6 hours.
Even holding a mobile phone camera in the front of the face while sitting for a long duration is not convenient.
1 code implementation • 13 Mar 2021 • Mallikarjun B R, Ayush Tewari, Abdallah Dib, Tim Weyrich, Bernd Bickel, Hans-Peter Seidel, Hanspeter Pfister, Wojciech Matusik, Louis Chevallier, Mohamed Elgharib, Christian Theobalt
We present an approach for high-quality intuitive editing of the camera viewpoint and scene illumination in a portrait image.
We show that a single handheld consumer-grade camera is sufficient to synthesize sophisticated renderings of a dynamic scene from novel virtual camera views, e. g. a `bullet-time' video effect.
We present the first method for real-time full body capture that estimates shape and motion of body and hands together with a dynamic 3D face model from a single color image.
Our approach has the following favorable properties: (i) It is the first full head morphable model that includes hair.
Our network design and loss functions ensure a disentangled parameterization of not only identity and albedo, but also, for the first time, an expression basis.
We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image.
The reflectance field of a face describes the reflectance properties responsible for complex lighting effects including diffuse, specular, inter-reflection and self shadowing.
At the level of patches, objects across different categories share similarities, which leads to more generalizable models.
no code implementations • 8 Apr 2020 • Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias Nießner, Rohit Pandey, Sean Fanello, Gordon Wetzstein, Jun-Yan Zhu, Christian Theobalt, Maneesh Agrawala, Eli Shechtman, Dan B. Goldman, Michael Zollhöfer
Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e. g., by the integration of differentiable rendering into network training.
StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination.
Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head.
1 code implementation • 3 Sep 2019 • Bernhard Egger, William A. P. Smith, Ayush Tewari, Stefanie Wuhrer, Michael Zollhoefer, Thabo Beeler, Florian Bernard, Timo Bolkart, Adam Kortylewski, Sami Romdhani, Christian Theobalt, Volker Blanz, Thomas Vetter
In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed.
To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material.
Our lightweight setup allows operations in uncontrolled environments, and lends itself to telepresence applications such as video-conferencing from dynamic environments.
Mesh autoencoders are commonly used for dimensionality reduction, sampling and mesh modeling.
In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces.
no code implementations • 29 May 2018 • Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Nießner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, Christian Theobalt
In order to enable source-to-target video re-animation, we render a synthetic target video with the reconstructed head animation parameters from a source video, and feed it into the trained network -- thus taking full control of the target.
As more and more personal photos are shared and tagged in social media, avoiding privacy risks such as unintended recognition becomes increasingly challenging.
To alleviate this problem, we present the first approach that jointly learns 1) a regressor for face shape, expression, reflectance and illumination on the basis of 2) a concurrently learned parametric face model.
In contrast, we propose to recover high-quality facial pose, shape, expression, reflectance and illumination using a deep neural network that is trained using a large, synthetically created training corpus.
In this work we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image.