document understanding

113 papers with code • 0 benchmarks • 2 datasets

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Libraries

Use these libraries to find document understanding models and implementations

Most implemented papers

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

microsoft/unilm ACL 2021

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

microsoft/unilm 18 Apr 2021

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Chargrid: Towards Understanding 2D Documents

antoinedelplace/chargrid EMNLP 2018

We introduce a novel type of text representation that preserves the 2D layout of a document.

OCR-free Document Understanding Transformer

clovaai/donut 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

jpwang/lilt ACL 2022

LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.

Unifying Vision, Text, and Layout for Universal Document Processing

microsoft/i-code CVPR 2023

UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.

ICDAR 2021 Competition on Scientific Literature Parsing

ibm-aur-nlp/PubLayNet 8 Jun 2021

Scientific literature contain important information related to cutting-edge innovations in diverse domains.

Message Passing Attention Networks for Document Understanding

giannisnik/mpad 17 Aug 2019

In this paper, we represent documents as word co-occurrence networks and propose an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD).

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

microsoft/unilm 16 Oct 2021

Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.

End-to-end Document Recognition and Understanding with Dessurt

herobd/dessurt 30 Mar 2022

Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks.