TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Inpainting	CelebA	ConPreDiff	LPIPS	0.022	# 1
Unconditional Image Generation	CelebA-HQ	ConPreDiff	FID	3.22	# 1
Text-to-Image Generation	MS COCO	ConPreDiff	FID	6.21	# 8
Text-to-Image Generation	MS COCO	ConPreDiff	Zero shot FID	6.21	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-diffusion-based-image-synthesis-1/image-inpainting-on-celeba)](https://paperswithcode.com/sota/image-inpainting-on-celeba?p=improving-diffusion-based-image-synthesis-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-diffusion-based-image-synthesis-1/unconditional-image-generation-on-celeba-hq)](https://paperswithcode.com/sota/unconditional-image-generation-on-celeba-hq?p=improving-diffusion-based-image-synthesis-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-diffusion-based-image-synthesis-1/text-to-image-generation-on-coco)](https://paperswithcode.com/sota/text-to-image-generation-on-coco?p=improving-diffusion-based-image-synthesis-1)`

Improving Diffusion-Based Image Synthesis with Context Prediction

NeurIPS 2023 · Ling Yang, Jingwei Liu, Shenda Hong, Zhilong Zhang, Zhilin Huang, Zheming Cai, Wentao Zhang, Bin Cui ·

Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes. However, such point-based reconstruction may fail to make each predicted pixel/feature fully preserve its neighborhood context, impairing diffusion-based image synthesis. As a powerful source of automatic supervisory signal, context has been well studied for learning representations. Inspired by this, we for the first time propose ConPreDiff to improve diffusion-based image synthesis with context prediction. We explicitly reinforce each point to predict its neighborhood context (i.e., multi-stride features/tokens/pixels) with a context decoder at the end of diffusion denoising blocks in training stage, and remove the decoder for inference. In this way, each point can better reconstruct itself by preserving its semantic connections with neighborhood context. This new paradigm of ConPreDiff can generalize to arbitrary discrete and continuous diffusion backbones without introducing extra parameters in sampling procedure. Extensive experiments are conducted on unconditional image generation, text-to-image generation and image inpainting tasks. Our ConPreDiff consistently outperforms previous methods and achieves a new SOTA text-to-image generation results on MS-COCO, with a zero-shot FID score of 6.21.

PDF Abstract NeurIPS 2023 PDF NeurIPS 2023 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Denoising

Image Generation

Image Inpainting

Text-to-Image Generation

Unconditional Image Generation

Datasets

ImageNet

MS COCO

CelebA

FFHQ

CelebA-HQ

Results from the Paper

Add Remove

Ranked #1 on Image Inpainting on CelebA (LPIPS metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Inpainting	CelebA	ConPreDiff	LPIPS	0.022	# 1	Compare
Unconditional Image Generation	CelebA-HQ	ConPreDiff	FID	3.22	# 1	Compare
Text-to-Image Generation	MS COCO	ConPreDiff	FID	6.21	# 8	Compare
Text-to-Image Generation	MS COCO	ConPreDiff	Zero shot FID	6.21	# 1	Compare

Methods

Add Remove

Diffusion • Inpainting

Edit Social Preview

Improving Diffusion-Based Image Synthesis with Context Prediction

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove