TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Relation Extraction	FUNSD	LayoutLM	F1	42.83	# 7
Document Image Classification	RVL-CDIP	Pre-trained LayoutLM	Accuracy	94.42%	# 15
Document Image Classification	RVL-CDIP	Pre-trained LayoutLM	Parameters	160M	# 21

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/layoutlm-pre-training-of-text-and-layout-for/relation-extraction-on-funsd)](https://paperswithcode.com/sota/relation-extraction-on-funsd?p=layoutlm-pre-training-of-text-and-layout-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/layoutlm-pre-training-of-text-and-layout-for/document-image-classification-on-rvl-cdip)](https://paperswithcode.com/sota/document-image-classification-on-rvl-cdip?p=layoutlm-pre-training-of-text-and-layout-for)`

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

31 Dec 2019 · Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou ·

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at \url{https://aka.ms/layoutlm}.

PDF Abstract

Code

Add Remove Mark official

microsoft/unilm official

↳ Quickstart in

Spaces

18,262

huggingface/transformers

124,457

PaddlePaddle/PaddleOCR

38,252

microsoft/unilm

18,262

PaddlePaddle/PaddleNLP

11,359

See all 15 implementations

Tasks

Add Remove

Document AI

Document Image Classification

Document Layout Analysis

Image Classification

Key Information Extraction

Relation Extraction

Datasets

FUNSD

RVL-CDIP

Results from the Paper

Edit

Ranked #7 on Relation Extraction on FUNSD

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Relation Extraction	FUNSD	LayoutLM	F1	42.83	# 7	Compare
Document Image Classification	RVL-CDIP	Pre-trained LayoutLM	Accuracy	94.42%	# 15	Compare
Document Image Classification	RVL-CDIP	Pre-trained LayoutLM	Parameters	160M	# 21	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove