TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Document Layout Analysis	PubLayNet val	VSR	Text	0.967	# 1
Document Layout Analysis	PubLayNet val	VSR	Title	0.931	# 2
Document Layout Analysis	PubLayNet val	VSR	List	0.947	# 6
Document Layout Analysis	PubLayNet val	VSR	Table	0.974	# 8
Document Layout Analysis	PubLayNet val	VSR	Figure	0.964	# 7
Document Layout Analysis	PubLayNet val	VSR	Overall	0.957	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vsr-a-unified-framework-for-document-layout/document-layout-analysis-on-publaynet-val)](https://paperswithcode.com/sota/document-layout-analysis-on-publaynet-val?p=vsr-a-unified-framework-for-document-layout)`

VSR: A Unified Framework for Document Layout Analysis combining Vision, Semantics and Relations

13 May 2021 · Peng Zhang, Can Li, Liang Qiao, Zhanzhan Cheng, ShiLiang Pu, Yi Niu, Fei Wu ·

Document layout analysis is crucial for understanding document structures. On this task, vision and semantics of documents, and relations between layout components contribute to the understanding process. Though many works have been proposed to exploit the above information, they show unsatisfactory results. NLP-based methods model layout analysis as a sequence labeling task and show insufficient capabilities in layout modeling. CV-based methods model layout analysis as a detection or segmentation task, but bear limitations of inefficient modality fusion and lack of relation modeling between layout components. To address the above limitations, we propose a unified framework VSR for document layout analysis, combining vision, semantics and relations. VSR supports both NLP-based and CV-based methods. Specifically, we first introduce vision through document image and semantics through text embedding maps. Then, modality-specific visual and semantic features are extracted using a two-stream network, which are adaptively fused to make full use of complementary information. Finally, given component candidates, a relation module based on graph neural network is incorported to model relations between components and output final results. On three popular benchmarks, VSR outperforms previous models by large margins. Code will be released soon.

PDF Abstract

Code

Add Remove Mark official

hikopensource/davar-lab-ocr official

708

Tasks

Add Remove

Document Layout Analysis

Relation

Datasets

PubLayNet DocBank

Results from the Paper

Edit

Ranked #3 on Document Layout Analysis on PubLayNet val

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Document Layout Analysis	PubLayNet val	VSR	Text	0.967	# 1	Compare
			Title	0.931	# 2	Compare
			List	0.947	# 6	Compare
			Table	0.974	# 8	Compare
			Figure	0.964	# 7	Compare
			Overall	0.957	# 3	Compare

Methods

Add Remove

Graph Neural Network

Edit Social Preview

VSR: A Unified Framework for Document Layout Analysis combining Vision, Semantics and Relations

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove