TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Generation	ImageNet 256x256	DiT-XL/2	FID	2.27	# 11
Image Generation	ImageNet 512x512	DiT-XL/2	FID	3.04	# 14
Image Generation	ImageNet 512x512	DiT-XL/2	Inception score	240.82	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scalable-diffusion-models-with-transformers/image-generation-on-imagenet-256x256)](https://paperswithcode.com/sota/image-generation-on-imagenet-256x256?p=scalable-diffusion-models-with-transformers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scalable-diffusion-models-with-transformers/image-generation-on-imagenet-512x512)](https://paperswithcode.com/sota/image-generation-on-imagenet-512x512?p=scalable-diffusion-models-with-transformers)`

Scalable Diffusion Models with Transformers

ICCV 2023 · William Peebles, Saining Xie ·

We explore a new class of diffusion models based on the transformer architecture. We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. We analyze the scalability of our Diffusion Transformers (DiTs) through the lens of forward pass complexity as measured by Gflops. We find that DiTs with higher Gflops -- through increased transformer depth/width or increased number of input tokens -- consistently have lower FID. In addition to possessing good scalability properties, our largest DiT-XL/2 models outperform all prior diffusion models on the class-conditional ImageNet 512x512 and 256x256 benchmarks, achieving a state-of-the-art FID of 2.27 on the latter.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

facebookresearch/DiT official

↳ Quickstart in

Colab

Spaces

Replicate

5,077

huggingface/diffusers

22,504

mindspore-lab/mindone

135

locuslab/get

milmor/diffusion-transformer

See all 7 implementations

Tasks

Add Remove

Image Generation

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #11 on Image Generation on ImageNet 256x256

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Generation	ImageNet 256x256	DiT-XL/2	FID	2.27	# 11	Compare
Image Generation	ImageNet 512x512	DiT-XL/2	FID	3.04	# 14	Compare
Image Generation	ImageNet 512x512	DiT-XL/2	Inception score	240.82	# 7	Compare

Methods

Add Remove

Concatenated Skip Connection • Convolution • Diffusion • Max Pooling • ReLU • Transformer • U-Net

Edit Social Preview

Scalable Diffusion Models with Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove