Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation

10 Jun 2021  ·  Dongjun Kim, Seungjae Shin, Kyungwoo Song, Wanmo Kang, Il-Chul Moon ·

Recent advances in diffusion models bring state-of-the-art performance on image generation tasks. However, empirical results from previous research in diffusion models imply an inverse correlation between density estimation and sample generation performances. This paper investigates with sufficient empirical evidence that such inverse correlation happens because density estimation is significantly contributed by small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training a score network well across the entire diffusion time is demanding because the loss scale is significantly imbalanced at each diffusion time. For successful training, therefore, we introduce Soft Truncation, a universally applicable training technique for diffusion models, that softens the fixed and static truncation hyperparameter into a random variable. In experiments, Soft Truncation achieves state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ 256x256, and STL-10 datasets.

PDF Abstract

Results from the Paper


Ranked #2 on Image Generation on CIFAR-10 (Inception score metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Generation CelebA 64x64 DDPM++ (VP, NLL) + ST FID 2.9 # 12
bits/dimension 1.96 # 3
Image Generation CelebA 64x64 UNCSN++ (RVE) + ST bits/dimension 1.97 # 4
Image Generation CelebA 64x64 DDPM++ (VP, FID) + ST FID 1.9 # 8
bits/dimension 2.1 # 6
Image Generation CelebA-HQ 256x256 UNCSN++ (RVE) + ST FID 7.16 # 7
Image Generation CIFAR-10 UDM (RVE) + ST FID 2.33 # 30
bits/dimension 3.04 # 35
Image Generation CIFAR-10 DDPM++ (VP, NLL) + ST Inception score 9.17 # 23
FID 3.45 # 50
bits/dimension 2.88 # 18
Image Generation CIFAR-10 DDPM++ (VP, FID) + ST Inception score 9.78 # 10
FID 2.47 # 35
bits/dimension 2.91 # 23
Image Generation CIFAR-10 UNCSN++ (RVE) + ST Inception score 10.11 # 2
Image Generation FFHQ 256 x 256 UDM (RVE) + ST FID 5.54 # 19
Image Generation ImageNet 32x32 DDPM++ (VP, NLL) + ST bpd 3.85 # 12
FID 8.42 # 3
Inception score 11.82 # 1
Image Generation LSUN Bedroom 256 x 256 UDM (RVE) + ST FID 4.57 # 10
Image Generation STL-10 UNCSN++ (RVE) + ST FID 7.71 # 2
Inception score 13.43 # 1

Methods