Document Image Classification

19 papers with code • 7 benchmarks • 2 datasets

Document image classification is the task of classifying documents based on images of their contents.

( Image credit: Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines )


Use these libraries to find Document Image Classification models and implementations

Most implemented papers

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

google-research/vision_transformer ICLR 2021

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

pytorch/fairseq 26 Jul 2019

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Training data-efficient image transformers & distillation through attention

facebookresearch/deit 23 Dec 2020

In this work, we produce a competitive convolution-free transformer by training on Imagenet only.

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

microsoft/unilm 31 Dec 2019

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

BEiT: BERT Pre-Training of Image Transformers

microsoft/unilm ICLR 2022

We first "tokenize" the original image into visual tokens.

Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification

microsoft/unilm 11 Apr 2017

We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half.

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

microsoft/unilm 29 Jan 2018

In this work, a region-based Deep Convolutional Neural Network framework is proposed for document structure learning.

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

microsoft/unilm ACL 2021

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Revisiting ResNets: Improved Training and Scaling Strategies

tensorflow/tpu NeurIPS 2021

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

microsoft/unilm 18 Apr 2021

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.