GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:
- Multiple POS tags, morphological features and lemmatization
- Sentence segmentation and rough speech act
- Document structure in TEI XML (paragraphs, headings, figures, etc.)
- ISO date/time annotations
- Speaker and addressee information (where relevant)
- Constituent and dependency syntax
- Information status (given, accessible, new, split antecedent)
- Entity and coreference annotation, including bridging anaphora
- Entity linking (Wikification)
- Discourse parses in Rhetorical Structure Theory and discourse dependencies