Document Layout Analysis

36 papers with code • 4 benchmarks • 9 datasets

"Document Layout Analysis is performed to determine physical structure of a document, that is, to determine document components. These document components can consist of single connected components-regions [...] of pixels that are adjacent to form single regions [...] , or group of text lines. A text line is a group of characters, symbols, and words that are adjacent, “relatively close” to each other and through which a straight line can be drawn (usually with horizontal or vertical orientation)." L. O'Gorman, "The document spectrum for page layout analysis," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp. 1162-1173, Nov. 1993.

Image credit: PubLayNet: largest dataset ever for document layout analysis

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Layout Analysis

Dataset	Best Model	Compare
PubLayNet val	VGT	See all
RVL-CDIP	VisualWordGrid	See all
Document Layout Recognition Challenge test	USYD NLP_CS29-2	See all
Document Layout Recognition Challenge mini-dev	Faster_RCNN	See all

Libraries

Use these libraries to find Document Layout Analysis models and implementations

huggingface/transformers

6 papers

125,425

microsoft/unilm

3 papers

18,378

facebookresearch/data2vec_vision

3 papers

PaddlePaddle/PaddleOCR

2 papers

38,665

See all 8 libraries.

Datasets

Subtasks

MS-SSIM

Latest papers

Most implemented Social Latest No code

BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset

anon-user-for-web/badlad • 9 Mar 2023

While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e. g., transcribing historical documents and newspapers.

09 Mar 2023

Paper
Code

CTE: A Dataset for Contextualized Table Extraction

ailab-unifi/cte-dataset • 2 Feb 2023

We define the task of Contextualized Table Extraction (CTE), which aims to extract and define the structure of tables considering the textual context of the document.

02 Feb 2023

Paper
Code

M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis

hciilab/m6doc • CVPR 2023

Document layout analysis is a crucial prerequisite for document understanding, including document retrieval and conversion.

01 Jan 2023

Paper
Code

Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

andreagemelli/doc2graph • • 23 Aug 2022

Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis.

106

23 Aug 2022

Paper
Code

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

adlnlp/doc_gcn • • COLING 2022

Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications.

22 Aug 2022

Paper
Code

DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

DS4SD/DocLayNet • 2 Jun 2022

Lastly, we compare models trained on PubLayNet, DocBank and DocLayNet, showing that layout predictions of the DocLayNet-trained models are more robust and thus the preferred choice for general-purpose document-layout analysis.

174

02 Jun 2022

Paper
Code

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

huggingface/transformers • • 18 Apr 2022

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

125,425

18 Apr 2022

Paper
Code

Towards End-to-End Unified Scene Text Detection and Layout Analysis

tensorflow/models • • CVPR 2022

In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis.

76,621

28 Mar 2022

Paper
Code

DiT: Self-supervised Pre-training for Document Image Transformer

huggingface/transformers • • 4 Mar 2022

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

125,425

04 Mar 2022

Paper
Code

DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

biswassanket/docsegtr • • 27 Jan 2022

has emerged as an interesting problem for the document analysis and understanding community.

27 Jan 2022

Paper
Code

Document Layout Analysis

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result