WritingPrompts is a large dataset of 300K human-written stories paired with writing prompts from an online forum.
94 PAPERS • 1 BENCHMARK
A creative writing task where the input is 4 random sentences and the output should be a coherent passage with 4 paragraphs that end in the 4 input sentences respectively. Such a task is open-ended and exploratory, and challenges creative thinking as well as high-level planning.
18 PAPERS • NO BENCHMARKS YET
A version of the CMU Movie Summary Corpus (http://www.cs.cmu.edu/~ark/personas/), which was originally scraped from plot summaries from Wikipedia, with some cleaning and sentences turned into events & sorted into "genres" (via LDA).
2 PAPERS • NO BENCHMARKS YET
A collection of long-running (80+ episodes) science fiction TV show synopses, scraped from Fandom.com wikis. Collected Nov 2017. Each episode is considered a "story".
1 PAPER • 1 BENCHMARK
TVRecap a story generation dataset that requires generating detailed TV show episode recaps from a brief summary and a set of documents describing the characters involved. Unlike other story generation datasets, TVRecap contains stories that are authored by professional screenwriters and that feature complex interactions among multiple characters. Generating stories in TVRecap requires drawing relevant information from the lengthy provided documents about characters based on the brief summary. In addition, by swapping the input and output, TVRecap can serve as a challenging testbed for abstractive summarization.
1 PAPER • 4 BENCHMARKS
Hugging Face Datasets (New!) | Website | Github Repository | arXiv e-Print
1 PAPER • NO BENCHMARKS YET