Diverse and Relevant Visual Storytelling with Scene Graph Embeddings

A problem in automatically generated stories for image sequences is that they use overly generic vocabulary and phrase structure and fail to match the distributional characteristics of human-generated text. We address this problem by introducing explicit representations for objects and their relations by extracting scene graphs from the images. Utilizing an embedding of this scene graph enables our model to more explicitly reason over objects and their relations during story generation, compared to the global features from an object classifier used in previous work. We apply metrics that account for the diversity of words and phrases of generated stories as well as for reference to narratively-salient image features and show that our approach outperforms previous systems. Our experiments also indicate that our models obtain competitive results on reference-based metrics.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Visual Storytelling VIST SGEmb BLEU-1 62.2 # 13
BLEU-2 38.7 # 9
BLEU-3 23.5 # 7
BLEU-4 14.8 # 5
METEOR 35.6 # 10
CIDEr 8.6 # 19
ROUGE-L 30.2 # 7

Methods


No methods listed for this paper. Add relevant methods here