Text-to-Video Generation

49 papers with code • 6 benchmarks • 9 datasets

This task refers to video generation based on a given sentence or sequence of words.

Most implemented papers

VideoComposer: Compositional Video Synthesis with Motion Controllability

ali-vilab/videocomposer NeurIPS 2023

The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis.

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

showlab/Tune-A-Video ICCV 2023

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.

ModelScope Text-to-Video Technical Report

exponentialml/text-to-video-finetuning 12 Aug 2023

This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i. e., Stable Diffusion).

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

ailab-cvc/videocrafter 30 Oct 2023

The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

mugen-org/MUGEN_baseline 17 Apr 2022

Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.

Make-A-Video: Text-to-Video Generation without Text-Video Data

lucidrains/make-a-video-pytorch 29 Sep 2022

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

stability-ai/generative-models CVPR 2023

We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.

Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator

soolab/free-bloom NeurIPS 2023

Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt.

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

Vchitect/LaVie 26 Sep 2023

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

GongyeLiu/StyleCrafter 1 Dec 2023

To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image.