5 dataset results for Discourse Parsing AND Texts

DISRPT2021

DISRPT2021 (DISRPT2021 shared task on Discourse Unit Segmentation, Connective Detection and Discourse Relation Classification)

The DISRPT 2021 shared task, co-located with CODI 2021 at EMNLP, introduces the second iteration of a cross-formalism shared task on discourse unit segmentation and connective detection, as well as the first iteration of a cross-formalism discourse relation classification task.

3 PAPERS • NO BENCHMARKS YET

AMALGUM

AMALGUM (A Machine Annotated Lookalike of GUM)

AMALGUM is a machine annotated multilayer corpus following the same design and annotation layers as GUM, but substantially larger (around 4M tokens). The goal of this corpus is to close the gap between high quality, richly annotated, but small datasets, and the larger but shallowly annotated corpora that are often scraped from the Web.

5 PAPERS • NO BENCHMARKS YET

SPOT (Sentiment Polarity Annotations Dataset)

The SPOT dataset contains 197 reviews originating from the Yelp'13 and IMDB collections (1), annotated with segment-level polarity labels (positive/neutral/negative). Annotations have been gathered on 2 levels of granulatiry:

3 PAPERS • NO BENCHMARKS YET

Instructional-DT (Instr-DT)

Instructional-DT (Instr-DT) (Instructional Discourse Treebank)

This discourse treebank includes annotated instructional texts originally assembled at the Information Technology Research Institute, University of Brighton. This dataset contains 176 documents with an average of 32.6 EDUs for a total of 5744 EDUs and 53,250 words.

4 PAPERS • 1 BENCHMARK

GUM (Georgetown University Multilayer corpus)

GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:

8 PAPERS • 1 BENCHMARK

Datasets

5 dataset results for Discourse Parsing AND Texts