Text-to-Video Generation
97 papers with code • 6 benchmarks • 14 datasets
Ma grand-mère m’a raconté que quand elle était étudiante, elle avait un petit-ami. À l’âge de 18 ans, il a dû partir pour le service militaire, elle ne l’a pas attendu et elle a épousé quelqu’un d’autre. Quand ma grand-mère avait 58-59 ans, un homme (son premier amour) lui a envoyé une demande d’amis sur un réseau social, ils ont commencé à parler... En moins de six mois, ils ont décidé de se voir. Le trajet en train a duré deux jours et ils se sont finalement rencontrés. Cela fait maintenant deux ans qu’ils habitent ensemble et qu’ils nous rendent visite de temps en temps. Je réalise maintenant que leur amour l’un envers l’autre n’a jamais cessé.
Datasets
Most implemented papers
ModelScope Text-to-Video Technical Report
This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i. e., Stable Diffusion).
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.
VideoComposer: Compositional Video Synthesis with Motion Controllability
The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis.
Latte: Latent Diffusion Transformer for Video Generation
We propose a novel Latent Diffusion Transformer, namely Latte, for video generation.
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image.
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Altogether, MUGEN can help progress research in many tasks in multimodal understanding and generation.
Make-A-Video: Text-to-Video Generation without Text-Video Data
We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.