TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Storytelling	VIST	SGVST	BLEU-4	14.7	# 7
Visual Storytelling	VIST	SGVST	METEOR	35.8	# 7
Visual Storytelling	VIST	SGVST	CIDEr	9.8	# 12
Visual Storytelling	VIST	SGVST	ROUGE-L	29.9	# 15

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/storytelling-from-an-image-stream-using-scene/visual-storytelling-on-vist)](https://paperswithcode.com/sota/visual-storytelling-on-vist?p=storytelling-from-an-image-stream-using-scene)`

Storytelling from an Image Stream Using Scene Graphs

The Thirty-Fourth AAAI Conference on Artificial Intelligence 2020 · Ruize Wang, Zhongyu Wei, Piji Li, Qi Zhang, Xuanjing Huang ·

Visual storytelling aims at generating a story from an image stream. Most existing methods tend to represent images directly with the extracted high-level features, which is not intuitive and difficult to interpret. We argue that translating each image into a graph-based semantic representation, i.e., scene graph, which explicitly encodes the objects and relationships detected within image, would benefit representing and describing images. To this end, we propose a novel graph-based architecture for visual storytelling by modeling the two-level relationships on scene graphs. In particular, on the within-image level, we employ a Graph Convolution Network (GCN) to enrich local fine-grained region representations of objects on scene graphs. To further model the interaction among images, on the cross-images level, a Temporal Convolution Network (TCN) is utilized to refine the region representations along the temporal dimension. Then the relation-aware representations are fed into the Gated Recurrent Unit (GRU) with attention mechanism for story generation. Experiments are conducted on the public visual storytelling dataset. Automatic and human evaluation results indicate that our method achieves state-of-the-art.

PDF Abstract