Story Visualization
19 papers with code • 3 benchmarks • 1 datasets
Story Visualization is the task of generating coherent and aligned sequence of images given a sequence of textual captions representing description of a story. It mainly consists of two tasks: story generation and story continuation, where story continuation uses additional ground truth information in the form of the first frame.
Latest papers
Masked Generative Story Transformer with Character Guidance and Caption Augmentation
Story Visualization (SV) is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences.
Training-Free Consistent Text-to-Image Generation
Text-to-image models offer a new level of creative flexibility by allowing users to guide the image generation process through natural language.
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Therefore, we introduce \textbf{StoryGPT-V}, which leverages the merits of the latent diffusion (LDM) and LLM to produce images with consistent and high-quality characters grounded on given story descriptions.
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study.
Story Visualization by Online Text Augmentation with Context Memory
Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences.
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.
TaleCrafter: Interactive Story Visualization with Multiple Characters
Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.
Make-A-Story: Visual Memory Conditioned Consistent Story Generation
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.
Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity.
Character-Centric Story Visualization via Visual Planning and Token Alignment
This task requires machines to 1) understand long text inputs and 2) produce a globally consistent image sequence that illustrates the contents of the story.