StoryBench is a multi-task benchmark to reliably evaluate the ability of text-to-video models to generate stories from a sequence of captions and their duration. It includes three datasets (DiDeMo, Oops, UVO) and three video generation tasks of increasing difficulty: action execution, where the next action must be generated starting from a conditioning video; story continuation, where a sequence of actions must be executed starting from a conditioning video; and story generation, where a video must be generated from only text prompts.
0 PAPER • 1 BENCHMARK