Document AI

23 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Document AI models and implementations

Datasets


Most implemented papers

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

microsoft/unilm 31 Dec 2019

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

DiT: Self-supervised Pre-training for Document Image Transformer

microsoft/unilm 4 Mar 2022

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

Unifying Vision, Text, and Layout for Universal Document Processing

microsoft/i-code CVPR 2023

UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

microsoft/unilm 18 Apr 2022

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

Modular Multimodal Machine Learning for Extraction of Theorems and Proofs in Long Scientific Documents (Extended Version)

mv96/mm_extraction 18 Jul 2023

We address the extraction of mathematical statements and their proofs from scholarly PDF articles as a multimodal classification problem, utilizing text, font features, and bitmap image renderings of PDFs as distinct modalities.

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

alibabaresearch/advancedliteratemachinery CVPR 2024

The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.

Document Intelligence Metrics for Visually Rich Document Evaluation

metricsdi/dimetrics 23 May 2022

The processing of Visually-Rich Documents (VRDs) is highly important in information extraction tasks associated with Document Intelligence.

DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop

neeleshkshukla/dosa 9 Nov 2022

An initial document-specific model can be trained and its inference can be used as feedback for generating more automated annotations.

ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction

MAEHCM/ICL-D3IE ICCV 2023

To this end, we propose a simple but effective in-context learning framework called ICL-D3IE, which enables LLMs to perform DIE with different types of demonstration examples.

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

alibabaresearch/advancedliteratemachinery CVPR 2023

Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.