Document Image Classification

24 papers with code • 8 benchmarks • 4 datasets

Document image classification is the task of classifying documents based on images of their contents.

( Image credit: Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines )


Use these libraries to find Document Image Classification models and implementations

Most implemented papers

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

google-research/vision_transformer ICLR 2021

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

pytorch/fairseq 26 Jul 2019

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Training data-efficient image transformers & distillation through attention

facebookresearch/deit 23 Dec 2020

In this work, we produce a competitive convolution-free transformer by training on Imagenet only.

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

microsoft/unilm 31 Dec 2019

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

BEiT: BERT Pre-Training of Image Transformers

microsoft/unilm ICLR 2022

We first "tokenize" the original image into visual tokens.

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

microsoft/unilm 18 Apr 2021

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification

microsoft/unilm 11 Apr 2017

We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half.

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

microsoft/unilm ACL 2021

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

microsoft/unilm 29 Jan 2018

In this work, a region-based Deep Convolutional Neural Network framework is proposed for document structure learning.

OCR-free Document Understanding Transformer

clovaai/donut 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.