Document Layout Analysis
36 papers with code • 4 benchmarks • 9 datasets
"Document Layout Analysis is performed to determine physical structure of a document, that is, to determine document components. These document components can consist of single connected components-regions [...] of pixels that are adjacent to form single regions [...] , or group of text lines. A text line is a group of characters, symbols, and words that are adjacent, “relatively close” to each other and through which a straight line can be drawn (usually with horizontal or vertical orientation)." L. O'Gorman, "The document spectrum for page layout analysis," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp. 1162-1173, Nov. 1993.
Image credit: PubLayNet: largest dataset ever for document layout analysis
Libraries
Use these libraries to find Document Layout Analysis models and implementationsLatest papers
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset
While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e. g., transcribing historical documents and newspapers.
CTE: A Dataset for Contextualized Table Extraction
We define the task of Contextualized Table Extraction (CTE), which aims to extract and define the structure of tables considering the textual context of the document.
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis
Document layout analysis is a crucial prerequisite for document understanding, including document retrieval and conversion.
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks
Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis.
Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis
Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications.
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis
Lastly, we compare models trained on PubLayNet, DocBank and DocLayNet, showing that layout predictions of the DocLayNet-trained models are more robust and thus the preferred choice for general-purpose document-layout analysis.
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.
Towards End-to-End Unified Scene Text Detection and Layout Analysis
In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis.
DiT: Self-supervised Pre-training for Document Image Transformer
We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.
DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer
has emerged as an interesting problem for the document analysis and understanding community.