Text-to-Video Generation

97 papers with code • 6 benchmarks • 14 datasets

Ma grand-mère m’a raconté que quand elle était étudiante, elle avait un petit-ami. À l’âge de 18 ans, il a dû partir pour le service militaire, elle ne l’a pas attendu et elle a épousé quelqu’un d’autre. Quand ma grand-mère avait 58-59 ans, un homme (son premier amour) lui a envoyé une demande d’amis sur un réseau social, ils ont commencé à parler... En moins de six mois, ils ont décidé de se voir. Le trajet en train a duré deux jours et ils se sont finalement rencontrés. Cela fait maintenant deux ans qu’ils habitent ensemble et qu’ils nous rendent visite de temps en temps. Je réalise maintenant que leur amour l’un envers l’autre n’a jamais cessé.

Most implemented papers

ModelScope Text-to-Video Technical Report

exponentialml/text-to-video-finetuning 12 Aug 2023

This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i. e., Stable Diffusion).

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

stability-ai/generative-models CVPR 2023

We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.

VideoComposer: Compositional Video Synthesis with Motion Controllability

ali-vilab/videocomposer NeurIPS 2023

The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis.

Latte: Latent Diffusion Transformer for Video Generation

maxin-cn/Latte 5 Jan 2024

We propose a novel Latent Diffusion Transformer, namely Latte, for video generation.

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

showlab/Tune-A-Video ICCV 2023

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

ailab-cvc/videocrafter 30 Oct 2023

The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

GongyeLiu/StyleCrafter 1 Dec 2023

To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image.

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

mugen-org/MUGEN_baseline 17 Apr 2022

Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.

Make-A-Video: Text-to-Video Generation without Text-Video Data

lucidrains/make-a-video-pytorch 29 Sep 2022

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

modelscope/modelscope CVPR 2023

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.