Document AI

16 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Document AI

Trend	Dataset	Best Model	Paper	Code	Compare
	EPHOIE	LayoutLMv3			See all

Libraries

Use these libraries to find Document AI models and implementations

alibabaresearch/advancedliteratemac…

4 papers

887

huggingface/transformers

3 papers

124,527

Datasets

EPHOIE

Subtasks

document understanding

Most implemented papers

Most implemented Social Latest No code

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

microsoft/unilm • • 31 Dec 2019

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

Paper
Code

DiT: Self-supervised Pre-training for Document Image Transformer

microsoft/unilm • • 4 Mar 2022

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

Paper
Code

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

AlibabaResearch/AdvancedLiterateMachinery • • 8 Apr 2024

The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.

Paper
Code

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

microsoft/unilm • • 18 Apr 2022

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

Paper
Code

Unifying Vision, Text, and Layout for Universal Document Processing

microsoft/i-code • • CVPR 2023

UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.

Paper
Code

Document Intelligence Metrics for Visually Rich Document Evaluation

metricsdi/dimetrics • 23 May 2022

The processing of Visually-Rich Documents (VRDs) is highly important in information extraction tasks associated with Document Intelligence.

Paper
Code

DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop

neeleshkshukla/dosa • 9 Nov 2022

An initial document-specific model can be trained and its inference can be used as feedback for generating more automated annotations.

Paper
Code

ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction

MAEHCM/ICL-D3IE • ICCV 2023

To this end, we propose a simple but effective in-context learning framework called ICL-D3IE, which enables LLMs to perform DIE with different types of demonstration examples.

Paper
Code

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

alibabaresearch/advancedliteratemachinery • • CVPR 2023

Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation.

Paper
Code

Context-Aware Chart Element Detection

pengyu965/chartdete • • 7 May 2023

As a prerequisite of chart data extraction, the accurate detection of chart basic elements is essential and mandatory.

Paper
Code

Document AI

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result