We introduce FUNSD-r and CORD-r in Token Path Prediction, the revised VrD-NER datasets to reflect the real-world scenarios of NER on scanned VrDs.

In FUNSD and CORD, segment layout annotations are aligned with labeled entities, which makes them not reflect the reading order issue of NER on scanned VrDs, and thus are unsuitable for evaluating current methods. In FUNSD-r and CORD-r, we automatically reannotate the layouts using PP-OCRv3 OCR system, and manually reannotate the named entities as word sequences based on the new layout annotations. Their segment layout annotations are aligned with real-world situations and entity mentions are labeled on words.

The proposed FUNSD-r consists of 199 document samples including the image, layout annotation of segments and words, and labeled entities of 3 categories. For the detailed summary statistics, please refer to the original paper.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • CC-BY-4.0