TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Document Image Classification	RVL-CDIP	DocFormerBASE	Accuracy	96.17%	# 3
Document Image Classification	RVL-CDIP	DocFormerBASE	Parameters	183M	# 22
Document Image Classification	RVL-CDIP	DocFormer large	Accuracy	95.50%	# 8
Document Image Classification	RVL-CDIP	DocFormer large	Parameters	536M	# 30

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/docformer-end-to-end-transformer-for-document/document-image-classification-on-rvl-cdip)](https://paperswithcode.com/sota/document-image-classification-on-rvl-cdip?p=docformer-end-to-end-transformer-for-document)`

DocFormer: End-to-End Transformer for Document Understanding

ICCV 2021 · Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, R. Manmatha ·

We present DocFormer -- a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU). VDU is a challenging problem which aims to understand documents in their varied formats (forms, receipts etc.) and layouts. In addition, DocFormer is pre-trained in an unsupervised fashion using carefully designed tasks which encourage multi-modal interaction. DocFormer uses text, vision and spatial features and combines them using a novel multi-modal self-attention layer. DocFormer also shares learned spatial embeddings across modalities which makes it easy for the model to correlate text to visual tokens and vice versa. DocFormer is evaluated on 4 different datasets each with strong baselines. DocFormer achieves state-of-the-art results on all of them, sometimes beating models 4x its size (in no. of parameters).

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

shabie/docformer

245

Tasks

Add Remove

Document Image Classification

document understanding

Datasets

FUNSD

RVL-CDIP CORD Kleister NDA

Results from the Paper

Edit

Ranked #3 on Document Image Classification on RVL-CDIP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Document Image Classification	RVL-CDIP	DocFormerBASE	Accuracy	96.17%	# 3	Compare
Document Image Classification	RVL-CDIP	DocFormerBASE	Parameters	183M	# 22	Compare
Document Image Classification	RVL-CDIP	DocFormer large	Accuracy	95.50%	# 8	Compare
Document Image Classification	RVL-CDIP	DocFormer large	Parameters	536M	# 30	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

DocFormer: End-to-End Transformer for Document Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove