10 papers with code • 1 benchmarks • 2 datasets
Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem.
Ranked #1 on Visual Storytelling on VIST
The task of multi-image cued story generation, such as visual storytelling dataset (VIST) challenge, is to compose multiple coherent sentences from a given sequence of images.
We present a neural model for generating short stories from image sequences, which extends the image description model by Vinyals et al. (Vinyals et al., 2015).
Previous storytelling approaches mostly focused on optimizing traditional metrics such as BLEU, ROUGE and CIDEr.
To solve this problem, we propose a method to mine the cross-modal rules to help the model infer these informative concepts given certain visual input.