Visual Storytelling

25 papers with code • 1 benchmarks • 4 datasets

( Image credit: No Metrics Are Perfect )

Latest papers with no code

Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling

no code yet • 10 Mar 2022

Third, a unified one-stage story generation model with encoder-decoder structure is proposed to simultaneously train and infer the knowledge-enriched attention network, group-wise semantic module and multi-modal story generation decoder in an end-to-end fashion.

A System for Image Understanding using Sensemaking and Narrative

no code yet • 21 Jan 2022

Sensemaking and narrative are two inherently interconnected concepts about how people understand the world around them.

Discourse Analysis for Evaluating Coherence in Video Paragraph Captions

no code yet • 17 Jan 2022

We also introduce DisNet, a novel dataset containing the proposed visual discourse annotations of 3000 videos and their paragraphs.

Visual Storytelling with Hierarchical BERT Semantic Guidance

no code yet • ACM Multimedia Asia 2022

As there is no ground-truth topic information, a pre-trained BERT model based on visual contents and annotated stories is utilized to mine topics.

RoViST: Learning Robust Metrics for Visual Storytelling

no code yet • ACL ARR December 2022

We measure the reliability of our metric sets by analysing its correlation with human judgement scores on a sample of machine stories obtained from 4 state-of-the-arts models trained on the Visual Storytelling Dataset (VIST).

Towards Coherent Visual Storytelling with Ordered Image Attention

no code yet • ACL ARR November 2021

To this end, we develop a novel message-passing-like algorithm for ordered image attention (OIA) that collects interactions across all the images in the sequence.

Learning to Rank Visual Stories From Human Ranking Data

no code yet • ACL ARR November 2021

In this paper, we present the VHED (VIST Human Evaluation Data) dataset, which first re-purposes human evaluation results for automatic evaluation; hence we develop Vrank (VIST Ranker), a novel reference-free VIST metric for story evaluation.

Graph Similarities and Dual Approach for Sequential Text-to-Image Retrieval

no code yet • 29 Sep 2021

We set a video captioning as a dual learning task that reconstructs the input story from the sampled image sequence.

Ordered Attention for Coherent Visual Storytelling

no code yet • 4 Aug 2021

OIA models interactions between the sentence-corresponding image and important regions in other images of the sequence.

Stretch-VST: Getting Flexible With Visual Stories

no code yet • ACL 2021

Therefore, we propose to {``}stretch{''} the stories, which create the potential to present in-depth visual details.