TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text-to-Video Generation	MSR-VTT	PixelDance	CLIPSIM	0.3125	# 1
Text-to-Video Generation	MSR-VTT	PixelDance	FVD	381	# 4
Text-to-Video Generation	UCF-101	PixelDance (Zero-shot, 256x256)	FVD16	242.82	# 3
Video Generation	UCF-101	PixelDance (256x256, text-conditional)	Inception Score	42.10	# 16
Video Generation	UCF-101	PixelDance (256x256, text-conditional)	FVD16	242.82	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/make-pixels-dance-high-dynamic-video/text-to-video-generation-on-ucf-101)](https://paperswithcode.com/sota/text-to-video-generation-on-ucf-101?p=make-pixels-dance-high-dynamic-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/make-pixels-dance-high-dynamic-video/text-to-video-generation-on-msr-vtt)](https://paperswithcode.com/sota/text-to-video-generation-on-msr-vtt?p=make-pixels-dance-high-dynamic-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/make-pixels-dance-high-dynamic-video/video-generation-on-ucf-101)](https://paperswithcode.com/sota/video-generation-on-ucf-101?p=make-pixels-dance-high-dynamic-video)`

Make Pixels Dance: High-Dynamic Video Generation

18 Nov 2023 · Yan Zeng, Guoqiang Wei, Jiani Zheng, Jiaxin Zou, Yang Wei, Yuchen Zhang, Hang Li ·

Creating high-dynamic videos such as motion-rich actions and sophisticated visual effects poses a significant challenge in the field of artificial intelligence. Unfortunately, current state-of-the-art video generation methods, primarily focusing on text-to-video generation, tend to produce video clips with minimal motions despite maintaining high fidelity. We argue that relying solely on text instructions is insufficient and suboptimal for video generation. In this paper, we introduce PixelDance, a novel approach based on diffusion models that incorporates image instructions for both the first and last frames in conjunction with text instructions for video generation. Comprehensive experimental results demonstrate that PixelDance trained with public data exhibits significantly better proficiency in synthesizing videos with complex scenes and intricate motions, setting a new standard for video generation.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Text-to-Video Generation

Video Generation

Datasets

UCF101

MSR-VTT

WebVid InternVid

Results from the Paper

Edit

Ranked #3 on Text-to-Video Generation on UCF-101

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text-to-Video Generation	MSR-VTT	PixelDance	CLIPSIM	0.3125	# 1	Compare
Text-to-Video Generation	MSR-VTT	PixelDance	FVD	381	# 4	Compare
Text-to-Video Generation	UCF-101	PixelDance (Zero-shot, 256x256)	FVD16	242.82	# 3	Compare
Video Generation	UCF-101	PixelDance (256x256, text-conditional)	Inception Score	42.10	# 16	Compare
Video Generation	UCF-101	PixelDance (256x256, text-conditional)	FVD16	242.82	# 9	Compare

Methods

Add Remove

Diffusion

Edit Social Preview

Make Pixels Dance: High-Dynamic Video Generation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove