Search Results for author: Chitwan Saharia

Found 16 papers, 8 papers with code

TryOnDiffusion: A Tale of Two UNets

1 code implementation • CVPR 2023 • Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman

Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person.

Virtual Try-on

112

Paper
Code

Synthetic Data from Diffusion Models Improves ImageNet Classification

no code implementations • 17 Apr 2023 • Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi, David J. Fleet

Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts.

Classification Data Augmentation

Paper
Add Code

Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild

no code implementations • 15 Feb 2023 • Hshmat Sahak, Daniel Watson, Chitwan Saharia, David Fleet

Diffusion models have shown promising results on single-image super-resolution and other image- to-image translation tasks.

Blind Super-Resolution Denoising +2

Paper
Add Code

Character-Aware Models Improve Visual Text Rendering

1 code implementation • 20 Dec 2022 • Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, RJ Mical, Mohammad Norouzi, Noah Constant

In the text-only domain, we find that character-aware models provide large gains on a novel spelling task (WikiSpell).

Image Generation

168

Paper
Code

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

no code implementations • CVPR 2023 • Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan

Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.

Image Inpainting Object +1

Paper
Add Code

Imagen Video: High Definition Video Generation with Diffusion Models

no code implementations • 5 Oct 2022 • Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models.

Ranked #1 on Video Generation on LAION-400M

Image Generation Video Generation +3

Paper
Add Code

Re-Imagen: Retrieval-Augmented Text-to-Image Generator

no code implementations • 29 Sep 2022 • Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen

To further evaluate the capabilities of the model, we introduce EntityDrawBench, a new benchmark that evaluates image generation for diverse entities, from frequent to rare, across multiple object categories including dogs, foods, landmarks, birds, and characters.

Ranked #3 on Text-to-Image Generation on MS COCO

Retrieval Text Retrieval +1

Paper
Add Code

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

4 code implementations • 23 May 2022 • Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.

Ranked #17 on Text-to-Image Generation on MS COCO (using extra training data)

7,771

Paper
Code

Deblurring via Stochastic Refinement

no code implementations • CVPR 2022 • Jay Whang, Mauricio Delbracio, Hossein Talebi, Chitwan Saharia, Alexandros G. Dimakis, Peyman Milanfar

Unlike existing techniques, we train a stochastic sampler that refines the output of a deterministic predictor and is capable of producing a diverse set of plausible reconstructions for a given input.

Deblurring Image Deblurring

Paper
Add Code

Palette: Image-to-Image Diffusion Models

4 code implementations • 10 Nov 2021 • Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi

We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research.

Ranked #1 on Colorization on ImageNet ctest10k

Colorization Denoising +5

1,372

Paper
Code

Cascaded Diffusion Models for High Fidelity Image Generation

no code implementations • 30 May 2021 • Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans

We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality.

Ranked #3 on Image Generation on ImageNet 64x64

Data Augmentation Image Generation +2

Paper
Add Code

Image Super-Resolution via Iterative Refinement

4 code implementations • 15 Apr 2021 • Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, Mohammad Norouzi

We present SR3, an approach to image Super-Resolution via Repeated Refinement.

Conditional Image Generation Denoising +1

3,343

Paper
Code

Non-Autoregressive Machine Translation with Latent Alignments

2 code implementations • EMNLP 2020 • Chitwan Saharia, William Chan, Saurabh Saxena, Mohammad Norouzi

In addition, we adapt the Imputer model for non-autoregressive machine translation and demonstrate that Imputer with just 4 generation steps can match the performance of an autoregressive Transformer baseline.

Machine Translation Translation

Paper
Code

Imputer: Sequence Modelling via Imputation and Dynamic Programming

1 code implementation • ICML 2020 • William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly

This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations.

Imputation speech-recognition +1

Paper
Code

Combating False Negatives in Adversarial Imitation Learning

no code implementations • 2 Feb 2020 • Konrad Zolna, Chitwan Saharia, Leonard Boussioux, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior.

Imitation Learning

Paper
Add Code

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

6 code implementations • ICLR 2019 • Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio

Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts.

Grounded language learning

2,010

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.