We demonstrate that stochastic conditioning significantly improves the 3D consistency of a naive sampler for an image-to-image diffusion model, which involves conditioning on a single fixed view.
By rendering raw output images from the resulting NeRF, we can perform novel high dynamic range (HDR) view synthesis tasks.
1 code implementation • 10 Nov 2021 • Ayush Tewari, Justus Thies, Ben Mildenhall, Pratul Srinivasan, Edgar Tretschk, Yifan Wang, Christoph Lassner, Vincent Sitzmann, Ricardo Martin-Brualla, Stephen Lombardi, Tomas Simon, Christian Theobalt, Matthias Niessner, Jonathan T. Barron, Gordon Wetzstein, Michael Zollhoefer, Vladislav Golyanik
The reconstruction of such a scene representation from observations using differentiable rendering losses is known as inverse graphics or inverse rendering.
A common approach to reconstruct such non-rigid scenes is through the use of a learned deformation field mapping from coordinates in each input image into a canonical template coordinate space.
We investigate the use of Neural Radiance Fields (NeRF) to learn high quality 3D object category models from collections of input images.
Obtaining high-quality 3D reconstructions of room-scale scenes is of paramount importance for upcoming applications in AR or VR.
Mip-NeRF is also able to match the accuracy of a brute-force supersampled NeRF on our multiscale dataset while being 22x faster.
Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes.
We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.
Many historical people were only ever captured by old, faded, black and white photos, that are distorted due to the limitations of early cameras and the passage of time.
Removing objects from images is a challenging problem that is important for many applications, including mixed reality.
We present the first method capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones.
We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs.
no code implementations • 8 Apr 2020 • Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias Nießner, Rohit Pandey, Sean Fanello, Gordon Wetzstein, Jun-Yan Zhu, Christian Theobalt, Maneesh Agrawala, Eli Shechtman, Dan B. Goldman, Michael Zollhöfer
Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e. g., by the integration of differentiable rendering into network training.
Then we train a generator to transform an input image along with a style-code to the output domain.
This paper introduces the largest and most diverse collection of rectified stereo image pairs to the research community, KeystoneDepth, consisting of tens of thousands of stereographs of historical people, events, objects, and scenes between 1860 and 1963.
no code implementations • • Rohit Pandey, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, Philip Davidson, Cem Keskin, Shahram Izadi, Sean Fanello
The key insight is to leverage previously seen "calibration" images of a given user to extrapolate what should be rendered in a novel viewpoint from the data available in the sensor.
Starting from internet photos of a tourist landmark, we apply traditional 3D reconstruction to register the photos and approximate the scene as a point cloud.
no code implementations • 12 Nov 2018 • Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln, Adarsh Kowdle, Christoph Rhemann, Dan B. Goldman, Cem Keskin, Steve Seitz, Shahram Izadi, Sean Fanello
We take the novel approach to augment such real-time performance capture systems with a deep architecture that takes a rendering from an arbitrary viewpoint, and jointly performs completion, super resolution, and denoising of the imagery in real-time.