The Content Types Dataset: a New Resource to Explore Semantic and Functional Characteristics of Texts

EACL 2017 · Rachele Sprugnoli, Tommaso Caselli, Sara Tonelli, Giovanni Moretti ·

This paper presents a new resource, called Content Types Dataset, to promote the analysis of texts as a composition of units with specific semantic and functional roles. By developing this dataset, we also introduce a new NLP task for the automatic classification of Content Types. The annotation scheme and the dataset are described together with two sets of classification experiments.

PDF Abstract