Document Layout Analysis
36 papers with code • 4 benchmarks • 9 datasets
"Document Layout Analysis is performed to determine physical structure of a document, that is, to determine document components. These document components can consist of single connected components-regions [...] of pixels that are adjacent to form single regions [...] , or group of text lines. A text line is a group of characters, symbols, and words that are adjacent, “relatively close” to each other and through which a straight line can be drawn (usually with horizontal or vertical orientation)." L. O'Gorman, "The document spectrum for page layout analysis," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp. 1162-1173, Nov. 1993.
Image credit: PubLayNet: largest dataset ever for document layout analysis
Libraries
Use these libraries to find Document Layout Analysis models and implementationsLatest papers
Text Role Classification in Scientific Charts Using Multimodal Transformers
The models are evaluated on various chart datasets, and results show that LayoutLMv3 outperforms UDOP in all experiments.
Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.
DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding
Our DCQA dataset is expected to foster research on understanding visualizations in documents, especially for scenarios that require complex reasoning for charts in the visually-rich document.
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond
In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations that are readable and manipulable by machines.
appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit
We present appjsonify, a Python-based PDF-to-JSON conversion toolkit for academic papers.
Vision Grid Transformer for Document Layout Analysis
Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI.
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis
In this study, we aim to fill these gaps by conducting a comparative evaluation of state-of-the-art models in document layout analysis and investigating the potential of cross-lingual layout analysis by utilizing machine translation techniques.
A Graphical Approach to Document Layout Analysis
Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e. g., text, title, figure).
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation
Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc.
PARAGRAPH2GRAPH: A GNN-based framework for layout paragraph analysis
Document layout analysis has a wide range of requirements across various domains, languages, and business scenarios.