Character-Preserving Coherent Story Visualization

Story visualization aims at generating a sequence of images to narrate each sentence in a multi-sentence story. Different from video generation that focuses on maintaining the continuity of generated images (frames), story visualization emphasizes preserving the global consistency of characters and scenes across different story pictures, which is very challenging since story sentences only provide sparse signals for generating images. Therefore, we propose a new framework named Character-Preserving Coherent Story Visualization (CP-CSV) to tackle the challenges. CP-CSV effectively learns to visualize the story by three critical modules: story and context encoder (story and sentence representation learning), figure-ground segmentation (auxiliary task to provide information for preserving character and story consistency), and figure-ground aware generation (image sequence generation by incorporating figure-ground information). Moreover, we propose a metric named Fr\'{e}chet Story Distance (FSD) to evaluate the performance of story visualization. Extensive experiments demonstrate that CP-CSV maintains the details of character information and achieves high consistency among different frames, while FSD better measures the performance of story visualization.

PDF Abstract

Datasets


Results from the Paper


Ranked #2 on Story Visualization on Pororo (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Story Visualization Pororo CPCSV FID 67.7 # 2
FSD 71.51 # 1
Story Visualization Pororo StoryGAN FID 77.67 # 3
FSD 111.09 # 2

Methods


No methods listed for this paper. Add relevant methods here