Text-to-Video Generation

40 papers with code • 5 benchmarks • 6 datasets

This task refers to video generation based on a given sentence or sequence of words.

Most implemented papers

VideoComposer: Compositional Video Synthesis with Motion Controllability

ali-vilab/videocomposer NeurIPS 2023

The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis.

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

showlab/Tune-A-Video ICCV 2023

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.

ModelScope Text-to-Video Technical Report

exponentialml/text-to-video-finetuning 12 Aug 2023

This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i. e., Stable Diffusion).

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

mugen-org/MUGEN_baseline 17 Apr 2022

Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.

Make-A-Video: Text-to-Video Generation without Text-Video Data

lucidrains/make-a-video-pytorch 29 Sep 2022

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

stability-ai/generative-models CVPR 2023

We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

Vchitect/LaVie 26 Sep 2023

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

GongyeLiu/StyleCrafter 1 Dec 2023

To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image.

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

ailab-cvc/videocrafter 17 Jan 2024

Based on this stronger coupling, we shift the distribution to higher quality without motion degradation by finetuning spatial modules with high-quality images, resulting in a generic high-quality video model.

Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures

Singularity42/Sync-DRAW 30 Nov 2016

This paper introduces a novel approach for generating videos called Synchronized Deep Recurrent Attentive Writer (Sync-DRAW).