Document Image Classification

24 papers with code • 8 benchmarks • 4 datasets

Document image classification is the task of classifying documents based on images of their contents.

( Image credit: Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines )

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Image Classification

Dataset	Best Model	Compare
RVL-CDIP	EAML	See all
Tobacco-3482	DocXClassifier-L	See all
Noisy Bangla Numeral	PCGAN-CHAR	See all
Noisy Bangla Characters	PCGAN-CHAR	See all
n-MNIST	PCGAN-CHAR	See all
Noisy MNIST	PCGAN-CHAR	See all
AIP	ResNet-RS (ResNet-200 + RS training tricks)	See all
SUT	CNN	See all

Libraries

Use these libraries to find Document Image Classification models and implementations

huggingface/transformers

11 papers

124,593

rwightman/pytorch-image-models

4 papers

29,671

facebookresearch/data2vec_vision

4 papers

PaddlePaddle/PaddleOCR

3 papers

38,330

See all 13 libraries.

Datasets

Latest papers

Most implemented Social Latest No code

SUT: a new multi-purpose synthetic dataset for Farsi document image analysis

aliiafkari/SUT_Dataset • 13th International Conference on Computer and Knowledge Engineering (ICCKE) 2023

This paper introduces a new large-scale dataset for Farsi document images, named SUT, which aims to tackle the challenges associated with obtaining diverse and substantial ground-truth data for supervised models in document image analysis (DIA) tasks, such as document image classification, text detection and recognition, and information retrieval.

27 Nov 2023

Paper
Code

StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training

PaddlePaddle/VIMER • • 1 Mar 2023

Compared to the masked multi-modal modeling methods for document image understanding that rely on both the image and text modalities, StrucTexTv2 models image-only input and potentially deals with more application scenarios free from OCR pre-processing.

479

01 Mar 2023

Paper
Code

Multimodal Side-Tuning for Document Classification

thezingaro/multimodal-side-tuning • • 16 Jan 2023

In this paper, we propose to exploit the side-tuning framework for multimodal document classification.

16 Jan 2023

Paper
Code

ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding

PaddlePaddle/PaddleNLP • • 12 Oct 2022

Recent years have witnessed the rise and success of pre-training techniques in visually-rich document understanding.

11,383

12 Oct 2022

Paper
Code

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

huggingface/transformers • • 18 Apr 2022

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

124,593

18 Apr 2022

Paper
Code

DocXClassifier: High Performance Explainable Deep Network for Document Image Classification

saifullah3396/docxclassifier • • TechArXiv 2022

Our approach achieves a new peak performance in image-based classification on two popular document datasets, namely RVL-CDIP and Tobacco3482, with a top-1 classification accuracy of 94. 17% and 95. 57% on the two datasets, respectively.

17 Mar 2022

Paper
Code

DiT: Self-supervised Pre-training for Document Image Transformer

huggingface/transformers • • 4 Mar 2022

We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR.

124,593

04 Mar 2022

Paper
Code

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

huggingface/transformers • • ACL 2022

LiLT can be pre-trained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models.

124,593

28 Feb 2022

Paper
Code

OCR-free Document Understanding Transformer

huggingface/transformers • • 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

124,593

30 Nov 2021

Paper
Code

DocFormer: End-to-End Transformer for Document Understanding

shabie/docformer • • ICCV 2021

DocFormer uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer.

245

22 Jun 2021

Paper
Code

Document Image Classification

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result