TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Key Information Extraction	CORD	GeoLayoutLM	F1	97.97	# 1
Relation Extraction	FUNSD	LayoutLMv3 large	F1	80.35	# 2
Relation Extraction	FUNSD	GeoLayoutLM	F1	89.45	# 1
Semantic entity labeling	FUNSD	GeoLayoutLM	F1	92.86	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/geolayoutlm-geometric-pre-training-for-visual/key-information-extraction-on-cord)](https://paperswithcode.com/sota/key-information-extraction-on-cord?p=geolayoutlm-geometric-pre-training-for-visual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/geolayoutlm-geometric-pre-training-for-visual/relation-extraction-on-funsd)](https://paperswithcode.com/sota/relation-extraction-on-funsd?p=geolayoutlm-geometric-pre-training-for-visual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/geolayoutlm-geometric-pre-training-for-visual/semantic-entity-labeling-on-funsd)](https://paperswithcode.com/sota/semantic-entity-labeling-on-funsd?p=geolayoutlm-geometric-pre-training-for-visual)`

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

CVPR 2023 · Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao ·

Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (\eg, the F1 score of RE on FUNSD is boosted from 80.35\% to 89.45\%). The code and models are publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/GeoLayoutLM

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

alibabaresearch/advancedliteratemac… official

930

Tasks

Add Remove

Document AI

entity_extraction

Key Information Extraction

Relation Extraction

Semantic entity labeling

Datasets

FUNSD CORD

Results from the Paper

Edit

Ranked #1 on Key Information Extraction on CORD

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Key Information Extraction	CORD	GeoLayoutLM	F1	97.97	# 1	Compare
Relation Extraction	FUNSD	LayoutLMv3 large	F1	80.35	# 2	Compare
Relation Extraction	FUNSD	GeoLayoutLM	F1	89.45	# 1	Compare
Semantic entity labeling	FUNSD	GeoLayoutLM	F1	92.86	# 4	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove