Introduced by Bamman et al. in An annotated dataset of literary entities

LitBank is an annotated dataset of 100 works of English-language fiction to support tasks in natural language processing and the computational humanities, described in more detail in the following publications:

  • David Bamman, Sejal Popat and Sheng Shen (2019), "An Annotated Dataset of Literary Entities," NAACL 2019.
  • Matthew Sims, Jong Ho Park and David Bamman (2019), "Literary Event Detection," ACL 2019.
  • David Bamman, Olivia Lewke and Anya Mansoor (2020), "An Annotated Dataset of Coreference in English Literature", LREC.

LitBank currently contains annotations for entities, events, entity coreference, and quotation attribution in a sample of ~2,000 words from each of those texts, totaling 210,532 tokens.

LitBank is licensed under a Creative Commons Attribution 4.0 International License.


Paper Code Results Date Stars

Dataset Loaders


Similar Datasets


  • Unknown