Text-to-Video Generation

49 papers with code • 6 benchmarks • 9 datasets

This task refers to video generation based on a given sentence or sequence of words.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-to-Video Generation

Dataset	Best Model	Compare
MSR-VTT	Snap Video (512x288)	See all
UCF-101	REGIS-Fuse (Finetuning, 128x128)	See all
EvalCrafter Text-to-Video (ECTV) Dataset	VideoCrafter2	See all
Kinetics	NUWA (128×128)	See all
Something-Something V2	MAGVIT	See all
WebVid	VideoFactory	See all

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

text-to-audio/make-an-audio • • 30 Jan 2023

Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.

Paper
Code

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

modelscope/modelscope • • CVPR 2023

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.

Paper
Code

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

picsart-ai-research/text2video-zero • • ICCV 2023

Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets.

Paper
Code

CelebV-Text: A Large-Scale Facial Text-Video Dataset

CelebV-Text/CelebV-Text • CVPR 2023

This paper presents CelebV-Text, a large-scale, diverse, and high-quality dataset of facial text-video pairs, to facilitate research on facial text-to-video generation tasks.

Paper
Code

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

mayuelala/followyourpose • • 3 Apr 2023

Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human.

Paper
Code

Generative Disco: Text-to-Video Generation for Music Visualization

hellovivian/generative-disco • • 17 Apr 2023

Visuals can enhance our experience of music, owing to the way they can amplify the emotions and messages conveyed within it.

Paper
Code

Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models

rohandkn/skribble2vid • 10 May 2023

The proliferation of video content demands efficient and flexible neural network based approaches for generating new video content.

Paper
Code

Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

daooshee/hd-vg-130m • 18 May 2023

Moreover, to fully unlock model capabilities for high-quality video generation and promote the development of the field, we curate a large-scale and open-source video dataset called HD-VG-130M.

Paper
Code

ControlVideo: Training-free Controllable Text-to-Video Generation

ybybzhang/controlvideo • • 22 May 2023

Text-driven diffusion models have unlocked unprecedented abilities in image generation, whereas their video counterpart still lags behind due to the excessive training cost of temporal modeling.

Paper
Code

DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation

ku-cvlab/direct2v • • 23 May 2023

In the paradigm of AI-generated content (AIGC), there has been increasing attention to transferring knowledge from pre-trained text-to-image (T2I) models to text-to-video (T2V) generation.

Paper
Code

Text-to-Video Generation

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result