Visual Storytelling
25 papers with code • 1 benchmarks • 4 datasets
( Image credit: No Metrics Are Perfect )
Most implemented papers
Knowledge-Enriched Visual Storytelling
This paper introduces KG-Story, a three-stage framework that allows the story generation model to take advantage of external Knowledge Graphs to produce interesting stories.
Plot and Rework: Modeling Storylines for Visual Storytelling
Writing a coherent and engaging story is not easy.
RoViST:Learning Robust Metrics for Visual Storytelling
We measure the reliability of our metric sets by analysing its correlation with human judgement scores on a sample of machine stories obtained from 4 state-of-the-arts models trained on the Visual Storytelling Dataset (VIST).
Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning
These results depict the effectiveness of commonsense knowledge infusion in improving the performance and expressiveness of scene graph generation for visual understanding and reasoning tasks.
Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models
We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning.
Detecting and Grounding Important Characters in Visual Stories
Characters are essential to the plot of any story.
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.
TouchStone: Evaluating Vision-Language Models by Language Models
Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs).
Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology
In this paper, we collect an anthology of 100 visual stories from authors who participated in our systematic creative process of improvised story-building based on image sequences.