Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

17 Oct 2023  ·  Chong Zhang, Ya Guo, Yi Tu, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang, Tao Gui ·

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems. Such reading order issue hinders the accurate marking of entities by BIO-tagging scheme, making it impossible for sequence-labeling methods to predict correct named entities. To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. Alternative to token classification, TPP models the document layout as a complete directed graph of tokens, and predicts token paths within the graph as entities. For better evaluation of VrD-NER systems, we also propose two revised benchmark datasets of NER on scanned documents which can reflect real-world scenarios. Experiment results demonstrate the effectiveness of our method, and suggest its potential to be a universal solution to various information extraction tasks on documents.

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Named Entity Recognition (NER) CORD-r TPP (LayoutMask) F1 89.34 # 2
Named Entity Recognition (NER) CORD-r TPP (LayoutLMv3) F1 91.85 # 1
Relation Extraction FUNSD TPP (LayoutMask) F1 79.20 # 4
Entity Linking FUNSD TPP (LayoutMask) F1 79.20 # 1
Named Entity Recognition (NER) FUNSD-r TPP (LayoutLMv3) F1 80.40 # 1
Named Entity Recognition (NER) FUNSD-r TPP (LayoutMask) F1 78.19 # 3
Reading Order Detection ReadingBank TPP (LayoutMask) Average Page-level BLEU 98.16 # 2
Average Relative Distance (ARD) 0.37 # 1


