Document AI
23 papers with code • 1 benchmarks • 1 datasets
Libraries
Use these libraries to find Document AI models and implementationsMost implemented papers
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.
DiT: Self-supervised Pre-training for Document Image Transformer
We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.
Unifying Vision, Text, and Layout for Universal Document Processing
UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.
Modular Multimodal Machine Learning for Extraction of Theorems and Proofs in Long Scientific Documents (Extended Version)
We address the extraction of mathematical statements and their proofs from scholarly PDF articles as a multimodal classification problem, utilizing text, font features, and bitmap image renderings of PDFs as distinct modalities.
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.
Document Intelligence Metrics for Visually Rich Document Evaluation
The processing of Visually-Rich Documents (VRDs) is highly important in information extraction tasks associated with Document Intelligence.
DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop
An initial document-specific model can be trained and its inference can be used as feedback for generating more automated annotations.
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction
To this end, we propose a simple but effective in-context learning framework called ICL-D3IE, which enables LLMs to perform DIE with different types of demonstration examples.
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.