Document Image Classification

24 papers with code • 8 benchmarks • 4 datasets

Document image classification is the task of classifying documents based on images of their contents.

( Image credit: Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines )

Libraries

Use these libraries to find Document Image Classification models and implementations

Most implemented papers

Revisiting ResNets: Improved Training and Scaling Strategies

tensorflow/tpu NeurIPS 2021

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

DiT: Self-supervised Pre-training for Document Image Transformer

microsoft/unilm 4 Mar 2022

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

jpwang/lilt ACL 2022

LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

microsoft/unilm 18 Apr 2022

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding

PaddlePaddle/PaddleNLP 12 Oct 2022

Recent years have witnessed the rise and success of pre-training techniques in visually-rich document understanding.

Light-Weighted CNN for Text Classification

RituYadav92/Lightweighted-CNN-for-Document-Classification 16 Apr 2020

As a solution to this problem, we introduced a whole new architecture based on separable convolution.

Improving accuracy and speeding up Document Image Classification through parallel systems

javiferran/document-classification 16 Jun 2020

This paper presents a study showing the benefits of the EfficientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions.

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

uakarsh/TiLT-Implementation 18 Feb 2021

We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics.

StructuralLM: Structural Pre-training for Form Understanding

alibaba/AliceMind ACL 2021

Large pre-trained language models achieve state-of-the-art results when fine-tuned on downstream NLP tasks.

DocFormer: End-to-End Transformer for Document Understanding

shabie/docformer ICCV 2021

DocFormer uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer.