Text-to-Video Generation
75 papers with code • 6 benchmarks • 9 datasets
This task refers to video generation based on a given sentence or sequence of words.
Libraries
Use these libraries to find Text-to-Video Generation models and implementationsDatasets
Most implemented papers
ModelScope Text-to-Video Technical Report
This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i. e., Stable Diffusion).
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.
VideoComposer: Compositional Video Synthesis with Motion Controllability
The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis.
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.
Latte: Latent Diffusion Transformer for Video Generation
We propose a novel Latent Diffusion Transformer, namely Latte, for video generation.
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.
Make-A-Video: Text-to-Video Generation without Text-Video Data
We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human.
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt.