TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Document Image Classification	RVL-CDIP	DocXClassifier-B	Accuracy	94.00%	# 16
Document Image Classification	RVL-CDIP	DocXClassifier-B	Parameters	95.4M	# 18
Document Image Classification	Tobacco-3482	DocXClassifier-L	Accuracy	95.57	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/docxclassifier-high-performance-explainable/document-image-classification-on-tobacco-3482)](https://paperswithcode.com/sota/document-image-classification-on-tobacco-3482?p=docxclassifier-high-performance-explainable)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/docxclassifier-high-performance-explainable/document-image-classification-on-rvl-cdip)](https://paperswithcode.com/sota/document-image-classification-on-rvl-cdip?p=docxclassifier-high-performance-explainable)`

DocXClassifier: High Performance Explainable Deep Network for Document Image Classification

TechArXiv 2022 · Saifullah, Stefan Agne, Andreas Dengel, Sheraz Ahmed ·

Convolutional Neural Networks (ConvNets) have been thoroughly researched for document image classification and are known for their exceptional performance in unimodal image-based document classification. Recently, however, there has been a sudden shift in the field towards multimodal approaches that simultaneously learn from the visual and textual features of the documents. While this has led to significant advances in the field, it has also led to a waning interest in improving pure ConvNets-based approaches. This is not desirable, as many of the multimodal approaches still use ConvNets as their visual backbone, and thus improving ConvNets is essential to improving these approaches. In this paper, we present DocXClassifier, a ConvNet-based approach that, using state-of-the-art model design patterns together with modern data augmentation and training strategies, not only achieves significant performance improvements in image-based document classification, but also outperforms some of the recently proposed multimodal approaches. Moreover, DocXClassifier is capable of generating transformer-like attention maps, which makes it inherently interpretable, a property not found in previous image-based classification models. Our approach achieves a new peak performance in image-based classification on two popular document datasets, namely RVL-CDIP and Tobacco3482, with a top-1 classification accuracy of 94.17% and 95.57% on the two datasets, respectively. Moreover, it sets a new record for the highest image-based classification accuracy of 90.14% on Tobacco3482 without transfer learning from RVL-CDIP. Finally, our proposed model may serve as a powerful visual backbone for future multimodal approaches, by providing much richer visual features than existing counterparts.

PDF

Code

Add Remove Mark official

saifullah3396/docxclassifier

Tasks

Add Remove

Classification

Data Augmentation

Document Classification

Document Image Classification

Image Classification

Transfer Learning

Vocal Bursts Intensity Prediction

Datasets

RVL-CDIP Tobacco-3482

Results from the Paper

Add Remove

Ranked #1 on Document Image Classification on Tobacco-3482

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Document Image Classification	RVL-CDIP	DocXClassifier-B	Accuracy	94.00%	# 16	Compare
Document Image Classification	RVL-CDIP	DocXClassifier-B	Parameters	95.4M	# 18	Compare
Document Image Classification	Tobacco-3482	DocXClassifier-L	Accuracy	95.57	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

DocXClassifier: High Performance Explainable Deep Network for Document Image Classification

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove