We propose a deep learning method for three-dimensional reconstruction in low-dose helical cone-beam computed tomography.
This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting.
We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal.
Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence.
We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner.
Ranked #1 on Image Generation on FFHQ-U
We present a modular differentiable renderer design that yields performance superior to previous methods by leveraging existing, highly optimized hardware graphics pipelines.
We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5. 59 to 2. 42.
Ranked #1 on Conditional Image Generation on ArtBench-10 (32x32)
This paper describes a simple technique to analyze Generative Adversarial Networks (GANs) and create interpretable controls for image synthesis, such as change of viewpoint, aging, lighting, and time of day.
Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.
Ranked #1 on Image Generation on LSUN Car 256 x 256
Many machine learning models operate on images, but ignore the fact that images are 2D projections formed by 3D geometry interacting with light, in a process called rendering.
Ranked #6 on Single-View 3D Reconstruction on ShapeNet
It has been recently shown that the hidden variables of convolutional neural networks make for an efficient perceptual similarity metric that accurately predicts human judgment on relative image similarity assessment.
Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images.
The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research.
Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach.
We introduce a general-purpose differentiable ray tracer, which, to our knowledge, is the first comprehensive solution that is able to compute derivatives of scalar functions over a rendered image with respect to arbitrary scene parameters such as camera pose, scene geometry, materials, and lighting parameters.
We apply basic statistical reasoning to signal reconstruction by machine learning -- learning to map corrupted observations to clean signals -- with a simple and powerful conclusion: it is possible to learn to restore images by only looking at corrupted examples, at performance at and sometimes exceeding training using clean data, without explicit image priors or likelihood models of the corruption.
We describe a new training methodology for generative adversarial networks.
Ranked #4 on Image Generation on LSUN Horse 256 x 256 (Clean-FID (trainfull) metric)
Our deep neural network learns a mapping from input waveforms to the 3D vertex coordinates of a face model, and simultaneously discovers a compact, latent code that disambiguates the variations in facial expression that cannot be explained by the audio alone.
We present a real-time deep learning framework for video-based facial performance capture -- the dense 3D tracking of an actor's face given a monocular video.