61 papers with code • 0 benchmarks • 1 datasets
Document understanding involves document classification, layout analysis, information extraction, and DocQA.
These leaderboards are used to track progress in document understanding
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.
In this paper, we represent documents as word co-occurrence networks and propose an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD).
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.
Recent years have witnessed the rise and success of pre-training techniques in visually-rich document understanding.