We observe that the majority of artifacts in sparse input scenarios are caused by errors in the estimated scene geometry, and by divergent behavior at the start of training.
We present NeSF, a method for producing 3D semantic fields from posed RGB images alone.
no code implementations • 25 Nov 2021 • Mehdi S. M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus Greff, Noha Radwan, Suhani Vora, Mario Lucic, Daniel Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas Funkhouser, Andrea Tagliasacchi
In this work, we propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area, infers a "set-latent scene representation", and synthesises novel views, all in a single feed-forward pass.
We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs.
Variational Autoencoders (VAEs) provide a theoretically-backed and popular framework for deep generative models.
State-of-the-art video restoration methods integrate optical flow estimation networks to utilize temporal information.
Together with a video discriminator, we also propose additional loss functions to further reinforce temporal consistency in the generated sequences.
Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison.
A possible explanation for training instabilities is the inherent imbalance between the networks: While the discriminator is trained directly on both real and fake samples, the generator only has control over the fake samples it produces since the real data distribution is fixed by the choice of a given dataset.
Recent advances in video super-resolution have shown that convolutional neural networks combined with motion compensation are able to merge information from multiple low-resolution (LR) frames to generate high-quality images.
Ranked #4 on Video Super-Resolution on Vid4 - 4x upscaling
Single image super-resolution is the task of inferring a high-resolution image from a single low-resolution input.
Light field photography captures rich structural information that may facilitate a number of traditional image processing and computer vision tasks.
Peer grading is the process of students reviewing each others' work, such as homework submissions, and has lately become a popular mechanism used in massive open online courses (MOOCs).