Text-to-Video Generation

49 papers with code • 6 benchmarks • 9 datasets

This task refers to video generation based on a given sentence or sequence of words.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-to-Video Generation

Dataset	Best Model	Compare
MSR-VTT	Snap Video (512x288)	See all
UCF-101	REGIS-Fuse (Finetuning, 128x128)	See all
EvalCrafter Text-to-Video (ECTV) Dataset	VideoCrafter2	See all
Kinetics	NUWA (128×128)	See all
Something-Something V2	MAGVIT	See all
WebVid	VideoFactory	See all

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

VideoComposer: Compositional Video Synthesis with Motion Controllability

ali-vilab/videocomposer • • NeurIPS 2023

The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis.

Paper
Code

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

showlab/Tune-A-Video • • ICCV 2023

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.

Paper
Code

ModelScope Text-to-Video Technical Report

exponentialml/text-to-video-finetuning • • 12 Aug 2023

This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i. e., Stable Diffusion).

Paper
Code

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

ailab-cvc/videocrafter • • 30 Oct 2023

The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.

Paper
Code

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

mugen-org/MUGEN_baseline • • 17 Apr 2022

Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.

Paper
Code

Make-A-Video: Text-to-Video Generation without Text-Video Data

lucidrains/make-a-video-pytorch • • 29 Sep 2022

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Paper
Code

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

stability-ai/generative-models • • CVPR 2023

We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.

Paper
Code

Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator

soolab/free-bloom • • NeurIPS 2023

Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt.

Paper
Code

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

Vchitect/LaVie • • 26 Sep 2023

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

Paper
Code

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

GongyeLiu/StyleCrafter • • 1 Dec 2023

To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image.

Paper
Code

Text-to-Video Generation

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result