…text segmentation and text segment classification) tasks and comprises 169 documents and gold standard annotations for page segments Partition (P2) contains 75 documents with a significantly richer
1 PAPER • NO BENCHMARKS YET
…In FUNSD and CORD, segment layout annotations are aligned with labeled entities, which makes them not reflect the reading order issue of NER on scanned VrDs, and thus are unsuitable for evaluating current Their segment layout annotations are aligned with real-world situations and entity mentions are labeled on words. The proposed FUNSD-r consists of 199 document samples including the image, layout annotation of segments and words, and labeled entities of 3 categories.
3 PAPERS • 1 BENCHMARK
…In FUNSD and CORD, segment layout annotations are aligned with labeled entities, which makes them not reflect the reading order issue of NER on scanned VrDs, and thus are unsuitable for evaluating current Their segment layout annotations are aligned with real-world situations and entity mentions are labeled on words. The proposed CORD-r consists of 999 document samples including the image, layout annotation of segments and words, and labeled entities of 30 categories.
…Annotations include: Multiple POS tags, morphological features and lemmatization Sentence segmentation and rough speech act Document structure in TEI XML (paragraphs, headings, figures, etc.)
8 PAPERS • 1 BENCHMARK