GUM (Georgetown University Multilayer corpus)

GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:

Multiple POS tags, morphological features and lemmatization
Sentence segmentation and rough speech act
Document structure in TEI XML (paragraphs, headings, figures, etc.)
ISO date/time annotations
Speaker and addressee information (where relevant)
Constituent and dependency syntax
Information status (given, accessible, new, split antecedent)
Entity and coreference annotation, including bridging anaphora
Entity linking (Wikification)
Discourse parses in Rhetorical Structure Theory and discourse dependencies

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Entity Linking	GUM	baseline

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Coreference Resolution

Part-Of-Speech Tagging

Relation Classification

Nested Named Entity Recognition

Discourse Parsing

Nested Mention Recognition

Timex normalization

Bridging Anaphora Resolution

Discourse Segmentation

Similar Datasets

OntoGUM

AMALGUM

Usage

License

CC-BY-NC-SA

Modalities

Texts
Speech

Languages

English