Visual Storytelling
25 papers with code • 1 benchmarks • 4 datasets
( Image credit: No Metrics Are Perfect )
Latest papers with no code
Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling
Third, a unified one-stage story generation model with encoder-decoder structure is proposed to simultaneously train and infer the knowledge-enriched attention network, group-wise semantic module and multi-modal story generation decoder in an end-to-end fashion.
A System for Image Understanding using Sensemaking and Narrative
Sensemaking and narrative are two inherently interconnected concepts about how people understand the world around them.
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions
We also introduce DisNet, a novel dataset containing the proposed visual discourse annotations of 3000 videos and their paragraphs.
Visual Storytelling with Hierarchical BERT Semantic Guidance
As there is no ground-truth topic information, a pre-trained BERT model based on visual contents and annotated stories is utilized to mine topics.
RoViST: Learning Robust Metrics for Visual Storytelling
We measure the reliability of our metric sets by analysing its correlation with human judgement scores on a sample of machine stories obtained from 4 state-of-the-arts models trained on the Visual Storytelling Dataset (VIST).
Towards Coherent Visual Storytelling with Ordered Image Attention
To this end, we develop a novel message-passing-like algorithm for ordered image attention (OIA) that collects interactions across all the images in the sequence.
Learning to Rank Visual Stories From Human Ranking Data
In this paper, we present the VHED (VIST Human Evaluation Data) dataset, which first re-purposes human evaluation results for automatic evaluation; hence we develop Vrank (VIST Ranker), a novel reference-free VIST metric for story evaluation.
Graph Similarities and Dual Approach for Sequential Text-to-Image Retrieval
We set a video captioning as a dual learning task that reconstructs the input story from the sampled image sequence.
Ordered Attention for Coherent Visual Storytelling
OIA models interactions between the sentence-corresponding image and important regions in other images of the sequence.
Stretch-VST: Getting Flexible With Visual Stories
Therefore, we propose to {``}stretch{''} the stories, which create the potential to present in-depth visual details.