Search Results for author: Robin Rombach

Found 23 papers, 17 papers with code

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

no code implementations • 18 Mar 2024 • Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitry Tochilkin, Christian Laforte, Robin Rombach, Varun Jampani

In this work, we propose SV3D that adapts image-to-video diffusion model for novel multi-view synthesis and 3D generation, thereby leveraging the generalization and multi-view consistency of the video models, while further adding explicit camera control for NVS.

3D Generation 3D Reconstruction +2

Paper
Add Code

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

no code implementations • 18 Mar 2024 • Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach

Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from many-shot to single-step inference, albeit at the cost of expensive and difficult optimization due to its reliance on a fixed pretrained DINOv2 discriminator.

Image Generation

Paper
Add Code

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

1 code implementation • 5 Mar 2024 • Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach

Rectified flow is a recent generative model formulation that connects data and noise in a straight line.

Reading Comprehension Text-to-Image Generation

Paper
Code

aMUSEd: An Open MUSE Reproduction

1 code implementation • 3 Jan 2024 • Suraj Patil, William Berman, Robin Rombach, Patrick von Platen

We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE.

Text-to-Image Generation

Paper
Code

DiffusionSat: A Generative Foundation Model for Satellite Imagery

no code implementations • 6 Dec 2023 • Samar Khanna, Patrick Liu, Linqi Zhou, Chenlin Meng, Robin Rombach, Marshall Burke, David Lobell, Stefano Ermon

Our method outperforms previous state-of-the-art methods for satellite image generation and is the first large-scale $\textit{generative}$ foundation model for satellite imagery.

Crop Yield Prediction Image Generation

Paper
Add Code

Adversarial Diffusion Distillation

4 code implementations • 28 Nov 2023 • Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality.

Image Generation

22,242

Paper
Code

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

2 code implementations • None 2023 • Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach

We then explore the impact of finetuning our base model on high-quality data and train a text-to-video model that is competitive with closed-source video generation.

Image Generation Image to Video Generation

22,242

Paper
Code

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

3 code implementations • 4 Jul 2023 • Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach

We present SDXL, a latent diffusion model for text-to-image synthesis.

Image Generation

22,242

Paper
Code

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

no code implementations • CVPR 2023 • Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler

We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene.

Scene Generation

Paper
Add Code

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

2 code implementations • CVPR 2023 • Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis

We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.

Ranked #5 on Text-to-Video Generation on MSR-VTT (CLIP-FID metric)

Image Generation Text-to-Video Generation +3

22,242

Paper
Code

On Distillation of Guided Diffusion Models

2 code implementations • CVPR 2023 • Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans

For standard diffusion models trained on the pixel-space, our approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable to that of the original model while being up to 256 times faster to sample from.

Denoising Image Generation +1

909

Paper
Code

Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

1 code implementation • 26 Jul 2022 • Robin Rombach, Andreas Blattmann, Björn Ommer

In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples.

Image Generation Prompt Engineering +1

10,562

Paper
Code

Semi-Parametric Neural Image Synthesis

2 code implementations • 25 Apr 2022 • Andreas Blattmann, Robin Rombach, Kaan Oktay, Jonas Müller, Björn Ommer

Much of this success is due to the scalability of these architectures and hence caused by a dramatic increase in model complexity and in the computational resources invested in training these models.

Image Generation Retrieval

10,562

Paper
Code

High-Resolution Image Synthesis with Latent Diffusion Models

32 code implementations • CVPR 2022 • Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.

Ranked #2 on Layout-to-Image Generation on COCO-Stuff 256x256

Denoising Image Inpainting +5

65,347

Paper
Code

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

no code implementations • NeurIPS 2021 • Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer

Thus, in contrast to pure autoregressive models, it can solve free-form image inpainting and, in the case of conditional models, local, text-guided image modification without requiring mask-specific training.

Ranked #4 on Text-to-Image Generation on Conceptual Captions

Image Inpainting Text-to-Image Generation

Paper
Add Code

High-Resolution Complex Scene Synthesis with Transformers

no code implementations • 13 May 2021 • Manuel Jahn, Robin Rombach, Björn Ommer

The use of coarse-grained layouts for controllable synthesis of complex scene images via deep generative models has recently gained popularity.

Vocal Bursts Intensity Prediction

Paper
Add Code

Stochastic Image-to-Video Synthesis using cINNs

1 code implementation • CVPR 2021 • Michael Dorkenwald, Timo Milbich, Andreas Blattmann, Robin Rombach, Konstantinos G. Derpanis, Björn Ommer

Video understanding calls for a model to learn the characteristic interplay between static scene content and its dynamics: Given an image, the model must be able to predict a future progression of the portrayed scene and, conversely, a video should be explained in terms of its static image content and all the remaining characteristics not present in the initial frame.

Video Understanding

179

Paper
Code

Geometry-Free View Synthesis: Transformers and no 3D Priors

1 code implementation • ICCV 2021 • Robin Rombach, Patrick Esser, Björn Ommer

Is a geometric model required to synthesize novel views from a single image?

Ranked #1 on Novel View Synthesis on RealEstate10K

Novel View Synthesis

361

Paper
Code

Taming Transformers for High-Resolution Image Synthesis

12 code implementations • CVPR 2021 • Patrick Esser, Robin Rombach, Björn Ommer

We demonstrate how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images.

Ranked #3 on Text-to-Image Generation on LHQC

DeepFake Detection Image Outpainting +4

5,361

Paper
Code

A Note on Data Biases in Generative Models

1 code implementation • 4 Dec 2020 • Patrick Esser, Robin Rombach, Björn Ommer

It is tempting to think that machines are less prone to unfairness and prejudice.

219

Paper
Code

Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs

1 code implementation • ECCV 2020 • Robin Rombach, Patrick Esser, Björn Ommer

To open such a black box, it is, therefore, crucial to uncover the different semantic concepts a model has learned as well as those that it has learned to be invariant to.

Paper
Code

Network-to-Network Translation with Conditional Invertible Neural Networks

1 code implementation • NeurIPS 2020 • Robin Rombach, Patrick Esser, Björn Ommer

Given the ever-increasing computational costs of modern machine learning models, we need to find new ways to reuse such expert models and thus tap into the resources that have been invested in their creation.

Image-to-Image Translation Text-to-Image Generation +1

219

Paper
Code

A Disentangling Invertible Interpretation Network for Explaining Latent Representations

2 code implementations • CVPR 2020 • Patrick Esser, Robin Rombach, Björn Ommer

We formulate interpretation as a translation of hidden representations onto semantic concepts that are comprehensible to the user.

Image Generation Image Manipulation

120

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.