Text-to-Video Generation

49 papers with code • 6 benchmarks • 9 datasets

This task refers to video generation based on a given sentence or sequence of words.

Most implemented papers

Latte: Latent Diffusion Transformer for Video Generation

maxin-cn/Latte 5 Jan 2024

We propose a novel Latent Diffusion Transformer, namely Latte, for video generation.

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

ailab-cvc/videocrafter 17 Jan 2024

Based on this stronger coupling, we shift the distribution to higher quality without motion degradation by finetuning spatial modules with high-quality images, resulting in a generic high-quality video model.

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

pku-yuangroup/magictime 7 Apr 2024

Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.

Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures

Singularity42/Sync-DRAW 30 Nov 2016

This paper introduces a novel approach for generating videos called Synchronized Deep Recurrent Attentive Writer (Sync-DRAW).

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

mehdidc/DALLE_clip_score 30 Apr 2021

Generating videos from text is a challenging task due to its high computational requirements for training and infinite possible answers for evaluation.

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

lucidrains/nuwa-pytorch 24 Nov 2021

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

thudm/cogvideo 29 May 2022

Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation.

Latent Video Diffusion Models for High-Fidelity Long Video Generation

yingqinghe/lvdm 23 Nov 2022

Diffusion models have shown remarkable results recently but require significant computational resources.

Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation

tsujuifu/pytorch_tvc CVPR 2023

Inspired by this, we introduce a novel task, text-guided video completion (TVC), which requests the model to generate a video from partial frames guided by an instruction.

MAGVIT: Masked Generative Video Transformer

google-research/magvit CVPR 2023

We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.