TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Document Image Classification	RVL-CDIP	Pre-trained EfficientNet	Accuracy	92.31%	# 22
Multi-Modal Document Classification	Tobacco-3482	EfficientNet+BERT	Accuracy	89.47%	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-accuracy-and-speeding-up-document/multi-modal-document-classification-on)](https://paperswithcode.com/sota/multi-modal-document-classification-on?p=improving-accuracy-and-speeding-up-document)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-accuracy-and-speeding-up-document/document-image-classification-on-rvl-cdip)](https://paperswithcode.com/sota/document-image-classification-on-rvl-cdip?p=improving-accuracy-and-speeding-up-document)`

Improving accuracy and speeding up Document Image Classification through parallel systems

16 Jun 2020 · Javier Ferrando, Juan Luis Dominguez, Jordi Torres, Raul Garcia, David Garcia, Daniel Garrido, Jordi Cortada, Mateo Valero ·

This paper presents a study showing the benefits of the EfficientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions. We show in the RVL-CDIP dataset that we can improve previous results with a much lighter model and present its transfer learning capabilities on a smaller in-domain dataset such as Tobacco3482. Moreover, we present an ensemble pipeline which is able to boost solely image input by combining image model predictions with the ones generated by BERT model on extracted text by OCR. We also show that the batch size can be effectively increased without hindering its accuracy so that the training process can be sped up by parallelizing throughout multiple GPUs, decreasing the computational time needed. Lastly, we expose the training performance differences between PyTorch and Tensorflow Deep Learning frameworks.

PDF Abstract

Code

Add Remove Mark official

javiferran/document-classification

Tasks

Add Remove

Document Classification

Document Image Classification

General Classification

Image Classification

Multi-Modal Document Classification

Optical Character Recognition (OCR)

Transfer Learning

Datasets

ImageNet

RVL-CDIP Tobacco-3482

Results from the Paper

Edit

Ranked #1 on Multi-Modal Document Classification on Tobacco-3482

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Document Image Classification	RVL-CDIP	Pre-trained EfficientNet	Accuracy	92.31%	# 22		Compare
Multi-Modal Document Classification	Tobacco-3482	EfficientNet+BERT	Accuracy	89.47%	# 1		Compare

Methods

Add Remove

1x1 Convolution • Adam • Attention Dropout • Average Pooling • Batch Normalization • BERT • Convolution • Dense Connections • Depthwise Convolution • Depthwise Separable Convolution • Dropout • EfficientNet • GELU • Inverted Residual Block • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Pointwise Convolution • ReLU • Residual Connection • RMSProp • Scaled Dot-Product Attention • Sigmoid Activation • Softmax • Squeeze-and-Excitation Block • Swish • Weight Decay • WordPiece

Edit Social Preview

Improving accuracy and speeding up Document Image Classification through parallel systems

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove