Text-to-Video Generation

19 papers with code • 3 benchmarks • 4 datasets

This task refers to video generation based on a given sentence or sequence of words.

Most implemented papers

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

showlab/Tune-A-Video 22 Dec 2022

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

mugen-org/MUGEN_baseline 17 Apr 2022

Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.

Make-A-Video: Text-to-Video Generation without Text-Video Data

lucidrains/make-a-video-pytorch 29 Sep 2022

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures

Singularity42/Sync-DRAW 30 Nov 2016

This paper introduces a novel approach for generating videos called Synchronized Deep Recurrent Attentive Writer (Sync-DRAW).

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

mehdidc/DALLE_clip_score 30 Apr 2021

Generating videos from text is a challenging task due to its high computational requirements for training and infinite possible answers for evaluation.

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

lucidrains/nuwa-pytorch 24 Nov 2021

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

thudm/cogvideo 29 May 2022

Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation.

Latent Video Diffusion Models for High-Fidelity Long Video Generation

yingqinghe/lvdm 23 Nov 2022

Diffusion models have shown remarkable results recently but require significant computational resources.

Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation

tsujuifu/pytorch_tvc CVPR 2023

Inspired by this, we introduce a novel task, text-guided video completion (TVC), which requests the model to generate a video from partial frames guided by an instruction.

MAGVIT: Masked Generative Video Transformer

google-research/magvit CVPR 2023

We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.