document understanding
113 papers with code • 0 benchmarks • 2 datasets
Document understanding involves document classification, layout analysis, information extraction, and DocQA.
Benchmarks
These leaderboards are used to track progress in document understanding
Libraries
Use these libraries to find document understanding models and implementationsMost implemented papers
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.
Chargrid: Towards Understanding 2D Documents
We introduce a novel type of text representation that preserves the 2D layout of a document.
OCR-free Document Understanding Transformer
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.
Unifying Vision, Text, and Layout for Universal Document Processing
UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.
ICDAR 2021 Competition on Scientific Literature Parsing
Scientific literature contain important information related to cutting-edge innovations in diverse domains.
Message Passing Attention Networks for Document Understanding
In this paper, we represent documents as word co-occurrence networks and propose an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD).
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.
End-to-end Document Recognition and Understanding with Dessurt
Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks.