Visual Storytelling
24 papers with code • 1 benchmarks • 4 datasets
( Image credit: No Metrics Are Perfect )
Latest papers with no code
TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling
As a cross-modal task, visual storytelling aims to generate a story for an ordered image sequence automatically.
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
In the Horizontal Layer, we introduce a novel RAG-based evolutionary system that optimizes the whole video generation workflow and the steps within the workflow.
Metamorpheus: Interactive, Affective, and Creative Dream Narration Through Metaphorical Visual Storytelling
It also remains unknown how technologies such as generative AI models can facilitate the meaning making process, and ultimately support affective mindfulness.
SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling
This weighted story graph produces the storyline in a sequence of events using Floyd-Warshall's algorithm.
MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising
Visual storytelling often uses nontypical aspect-ratio images like scroll paintings, comic strips, and panoramas to create an expressive and compelling narrative.
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models
Recent advances in image and video creation, especially AI-based image synthesis, have led to the production of numerous visual scenes that exhibit a high level of abstractness and diversity.
Visual Storytelling with Question-Answer Plans
Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret.
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips
Comic strips are a popular and expressive form of visual storytelling that can convey humor, emotion, and information.
Text-Only Training for Visual Storytelling
Visual storytelling aims to generate a narrative based on a sequence of images, necessitating both vision-language alignment and coherent story generation.
AOG-LSTM: An adaptive attention neural network for visual storytelling
Moreover, the existing method of alleviating error accumulation based on replacing reference words does not take into account the different effects of each word.