Document Image Classification

24 papers with code • 8 benchmarks • 4 datasets

Document image classification is the task of classifying documents based on images of their contents.

( Image credit: Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines )

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Image Classification

Dataset	Best Model	Compare
RVL-CDIP	EAML	See all
Tobacco-3482	DocXClassifier-L	See all
Noisy Bangla Numeral	PCGAN-CHAR	See all
Noisy Bangla Characters	PCGAN-CHAR	See all
n-MNIST	PCGAN-CHAR	See all
Noisy MNIST	PCGAN-CHAR	See all
AIP	ResNet-RS (ResNet-200 + RS training tricks)	See all
SUT	CNN	See all

Libraries

Use these libraries to find Document Image Classification models and implementations

huggingface/transformers

10 papers

125,940

rwightman/pytorch-image-models

4 papers

29,949

facebookresearch/data2vec_vision

4 papers

PaddlePaddle/PaddleOCR

3 papers

38,910

See all 13 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

google-research/vision_transformer • • ICLR 2021

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.

144

Paper
Code

RoBERTa: A Robustly Optimized BERT Pretraining Approach

pytorch/fairseq • • 26 Jul 2019

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Paper
Code

Training data-efficient image transformers & distillation through attention

facebookresearch/deit • • 23 Dec 2020

In this work, we produce a competitive convolution-free transformer by training on Imagenet only.

Paper
Code

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

microsoft/unilm • • 31 Dec 2019

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

Paper
Code

BEiT: BERT Pre-Training of Image Transformers

microsoft/unilm • • ICLR 2022

We first "tokenize" the original image into visual tokens.

Paper
Code

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

microsoft/unilm • • 18 Apr 2021

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Paper
Code

Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification

microsoft/unilm • • 11 Apr 2017

We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half.

Paper
Code

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

microsoft/unilm • • ACL 2021

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Paper
Code

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

microsoft/unilm • • 29 Jan 2018

In this work, a region-based Deep Convolutional Neural Network framework is proposed for document structure learning.

Paper
Code

OCR-free Document Understanding Transformer

clovaai/donut • • 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

Paper
Code

Document Image Classification

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result