Text-to-Video Generation

49 papers with code • 6 benchmarks • 9 datasets

This task refers to video generation based on a given sentence or sequence of words.

Most implemented papers

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

text-to-audio/make-an-audio 30 Jan 2023

Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

modelscope/modelscope CVPR 2023

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

picsart-ai-research/text2video-zero ICCV 2023

Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets.

CelebV-Text: A Large-Scale Facial Text-Video Dataset

CelebV-Text/CelebV-Text CVPR 2023

This paper presents CelebV-Text, a large-scale, diverse, and high-quality dataset of facial text-video pairs, to facilitate research on facial text-to-video generation tasks.

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

mayuelala/followyourpose 3 Apr 2023

Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human.

Generative Disco: Text-to-Video Generation for Music Visualization

hellovivian/generative-disco 17 Apr 2023

Visuals can enhance our experience of music, owing to the way they can amplify the emotions and messages conveyed within it.

Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models

rohandkn/skribble2vid 10 May 2023

The proliferation of video content demands efficient and flexible neural network based approaches for generating new video content.

Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

daooshee/hd-vg-130m 18 May 2023

Moreover, to fully unlock model capabilities for high-quality video generation and promote the development of the field, we curate a large-scale and open-source video dataset called HD-VG-130M.

ControlVideo: Training-free Controllable Text-to-Video Generation

ybybzhang/controlvideo 22 May 2023

Text-driven diffusion models have unlocked unprecedented abilities in image generation, whereas their video counterpart still lags behind due to the excessive training cost of temporal modeling.

DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation

ku-cvlab/direct2v 23 May 2023

In the paradigm of AI-generated content (AIGC), there has been increasing attention to transferring knowledge from pre-trained text-to-image (T2I) models to text-to-video (T2V) generation.