Visual Storytelling
25 papers with code • 1 benchmarks • 4 datasets
( Image credit: No Metrics Are Perfect )
Latest papers
Gorgeous: Create Your Desired Character Facial Makeup from Any Ideas
Contemporary makeup transfer methods primarily focus on replicating makeup from one face to another, considerably limiting their use in creating diverse and creative character makeup essential for visual storytelling.
inkn'hue: Enhancing Manga Colorization from Multiple Priors with Alignment Multi-Encoder VAE
Yet, the desire to experience manga in vibrant colors has sparked the pursuit of manga colorization, a task of paramount significance for artists.
GROOViST: A Metric for Grounding Objects in Visual Storytelling
A proper evaluation of stories generated for a sequence of images -- the task commonly referred to as visual storytelling -- must consider multiple aspects, such as coherence, grammatical correctness, and visual grounding.
Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology
In this paper, we collect an anthology of 100 visual stories from authors who participated in our systematic creative process of improvised story-building based on image sequences.
TouchStone: Evaluating Vision-Language Models by Language Models
Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs).
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.
Detecting and Grounding Important Characters in Visual Stories
Characters are essential to the plot of any story.
Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models
We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning.
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning
These results depict the effectiveness of commonsense knowledge infusion in improving the performance and expressiveness of scene graph generation for visual understanding and reasoning tasks.