…text segmentation and text segment classification) tasks and comprises 169 documents and gold standard annotations for page segments Partition (P2) contains 75 documents with a significantly richer
1 PAPER • NO BENCHMARKS YET
…Annotations include: Multiple POS tags, morphological features and lemmatization Sentence segmentation and rough speech act Document structure in TEI XML (paragraphs, headings, figures, etc.)
8 PAPERS • 1 BENCHMARK