Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding

11 Mar 2022  ·  Yidan Sun, Qin Chao, Yangfeng Ji, Boyang Li ·

Despite recent advances of AI, story understanding remains an open and under-investigated problem. We collect, preprocess, and publicly release a video-language story dataset, Synopses of Movie Narratives (SyMoN), containing 5,193 video summaries of popular movies and TV series with a total length of 869 hours. SyMoN captures naturalistic storytelling videos made by human creators and intended for a human audience. As a prototypical and naturalistic story dataset, SyMoN features high coverage of multimodal story events and abundant mental-state descriptions. Its use of storytelling techniques cause cross-domain semantic gaps that provide appropriate challenges to existing models. We establish benchmarks on video-text retrieval and zero-shot alignment on movie summary videos, which showcase the importance of in-domain data and long-term memory in story understanding. With SyMoN, we hope to lay the groundwork for progress in multimodal story understanding.

PDF Abstract

Datasets


Introduced in the Paper:

SYMON

Used in the Paper:

Kinetics ActivityNet MovieNet CMD

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here