Document AI

15 papers with code • 1 benchmarks • 1 datasets

Most implemented papers

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

microsoft/unilm 31 Dec 2019

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

DiT: Self-supervised Pre-training for Document Image Transformer

microsoft/unilm 4 Mar 2022

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

microsoft/unilm 18 Apr 2022

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

Unifying Vision, Text, and Layout for Universal Document Processing

microsoft/i-code CVPR 2023

UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.

Document Intelligence Metrics for Visually Rich Document Evaluation

metricsdi/dimetrics 23 May 2022

The processing of Visually-Rich Documents (VRDs) is highly important in information extraction tasks associated with Document Intelligence.

DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop

neeleshkshukla/dosa 9 Nov 2022

An initial document-specific model can be trained and its inference can be used as feedback for generating more automated annotations.

ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction


To this end, we propose a simple but effective in-context learning framework called ICL-D3IE, which enables LLMs to perform DIE with different types of demonstration examples.

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

alibabaresearch/advancedliteratemachinery CVPR 2023

Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.

Context-Aware Chart Element Detection

pengyu965/chartdete 7 May 2023

As a prerequisite of chart data extraction, the accurate detection of chart basic elements is essential and mandatory.

Document Understanding Dataset and Evaluation (DUDE)

rubenpt91/MP-DocVQA-Framework ICCV 2023

We call on the Document AI (DocAI) community to reevaluate current methodologies and embrace the challenge of creating more practically-oriented benchmarks.