GUM (Georgetown University Multilayer corpus)

GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:

  • Multiple POS tags, morphological features and lemmatization
  • Sentence segmentation and rough speech act
  • Document structure in TEI XML (paragraphs, headings, figures, etc.)
  • ISO date/time annotations
  • Speaker and addressee information (where relevant)
  • Constituent and dependency syntax
  • Information status (given, accessible, new, split antecedent)
  • Entity and coreference annotation, including bridging anaphora
  • Entity linking (Wikification)
  • Discourse parses in Rhetorical Structure Theory and discourse dependencies

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages