TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Story Continuation	FlintstonesSV	AR-LDM	FID	19.28	# 1
Story Visualization	Pororo	AR-LDM	FID	16.59	# 1
Story Continuation	PororoSV	AR-LDM	FID	17.4	# 1
Story Continuation	VIST	AR-LDM (DII captions)	FID	17.03	# 2
Story Continuation	VIST	AR-LDM (SIS captions)	FID	16.95	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/synthesizing-coherent-story-with-auto/story-continuation-on-flintstonessv)](https://paperswithcode.com/sota/story-continuation-on-flintstonessv?p=synthesizing-coherent-story-with-auto)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/synthesizing-coherent-story-with-auto/story-visualization-on-pororo)](https://paperswithcode.com/sota/story-visualization-on-pororo?p=synthesizing-coherent-story-with-auto)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/synthesizing-coherent-story-with-auto/story-continuation-on-pororosv)](https://paperswithcode.com/sota/story-continuation-on-pororosv?p=synthesizing-coherent-story-with-auto)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/synthesizing-coherent-story-with-auto/story-continuation-on-vist)](https://paperswithcode.com/sota/story-continuation-on-vist?p=synthesizing-coherent-story-with-auto)`

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

20 Nov 2022 · Xichen Pan, Pengda Qin, Yuhong Li, Hui Xue, Wenhu Chen ·

Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity. Recently, most works focus on synthesizing independent images; While for real-world applications, it is common and necessary to generate a series of coherent images for story-stelling. In this work, we mainly focus on story visualization and continuation tasks and propose AR-LDM, a latent diffusion model auto-regressively conditioned on history captions and generated images. Moreover, AR-LDM can generalize to new characters through adaptation. To our best knowledge, this is the first work successfully leveraging diffusion models for coherent visual story synthesizing. Quantitative results show that AR-LDM achieves SoTA FID scores on PororoSV, FlintstonesSV, and the newly introduced challenging dataset VIST containing natural images. Large-scale human evaluations show that AR-LDM has superior performance in terms of quality, relevance, and consistency.

PDF Abstract

Code

Add Remove Mark official

xichenpan/ARLDM official

174

Tasks

Add Remove

Story Continuation

Story Visualization

Datasets

VIST

Results from the Paper

Edit

Ranked #1 on Story Visualization on Pororo

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Story Continuation	FlintstonesSV	AR-LDM	FID	19.28	# 1	Compare
Story Visualization	Pororo	AR-LDM	FID	16.59	# 1	Compare
Story Continuation	PororoSV	AR-LDM	FID	17.4	# 1	Compare
Story Continuation	VIST	AR-LDM (DII captions)	FID	17.03	# 2	Compare
Story Continuation	VIST	AR-LDM (SIS captions)	FID	16.95	# 1	Compare

Methods

Add Remove

Diffusion • Latent Diffusion Model

Edit Social Preview

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove