Palette: Image-to-Image Diffusion Models

We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models. On four challenging image-to-image translation tasks (colorization, inpainting, uncropping, and JPEG decompression), Palette outperforms strong GAN and regression baselines, and establishes a new state of the art. This is accomplished without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss, demonstrating a desirable degree of generality and flexibility. We uncover the impact of using $L_2$ vs. $L_1$ loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention through empirical architecture studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, and report several sample quality scores including FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against reference images for various baselines. We expect this standardized evaluation protocol to play a critical role in advancing image-to-image translation research. Finally, we show that a single generalist Palette model trained on 3 tasks (colorization, inpainting, JPEG decompression) performs as well or better than task-specific specialist counterparts.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
JPEG Decompression ImageNet Palette (QF: 10) FID-5K 5.4 # 2
IS 180.5 # 2
CA 70.7 # 2
PD 58.3 # 2
JPEG Decompression ImageNet Palette (QF: 20) FID-5K 4.3 # 1
IS 208.7 # 1
CA 73.5 # 1
PD 37.1 # 1
JPEG Decompression ImageNet Regression (QF: 20) FID-5K 11.5 # 4
IS 158.7 # 3
CA 69.7 # 3
PD 65.4 # 3
JPEG Decompression ImageNet Palette (QF: 5) FID-5K 8.3 # 3
IS 133.6 # 4
CA 64.2 # 4
PD 95.5 # 4
JPEG Decompression ImageNet Regression (QF: 5) FID-5K 29.0 # 6
IS 73.9 # 6
CA 52.8 # 6
PD 155.4 # 6
JPEG Decompression ImageNet Regression (QF: 10) FID-5K 18.0 # 5
IS 117.2 # 5
CA 63.5 # 5
PD 102.2 # 5
Colorization ImageNet val Palette FID-5K 15.78 # 1
Uncropping Places2 val Palette FID 3.53 # 1
PD 103.3 # 1
Fool rate 39.9 # 1
Image Inpainting Places2 val Palatte (20-30% free form) FID 11.7 # 1
PD 35.0 # 1
Image Inpainting Places2 val Palette (128×128 center mask) FID 11.9 # 2
PD 57.3 # 2

Methods


No methods listed for this paper. Add relevant methods here