TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Document Layout Analysis	PubLayNet val	Mask RCNN	Text	0.916	# 11
Document Layout Analysis	PubLayNet val	Mask RCNN	Title	0.840	# 11
Document Layout Analysis	PubLayNet val	Mask RCNN	List	0.886	# 11
Document Layout Analysis	PubLayNet val	Mask RCNN	Table	0.960	# 12
Document Layout Analysis	PubLayNet val	Mask RCNN	Figure	0.949	# 11
Document Layout Analysis	PubLayNet val	Mask RCNN	Overall	0.910	# 11
Document Layout Analysis	PubLayNet val	Faster RCNN	Text	0.910	# 12
Document Layout Analysis	PubLayNet val	Faster RCNN	Title	0.826	# 12
Document Layout Analysis	PubLayNet val	Faster RCNN	List	0.883	# 12
Document Layout Analysis	PubLayNet val	Faster RCNN	Table	0.954	# 13
Document Layout Analysis	PubLayNet val	Faster RCNN	Figure	0.937	# 12
Document Layout Analysis	PubLayNet val	Faster RCNN	Overall	0.902	# 12

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/190807836/document-layout-analysis-on-publaynet-val)](https://paperswithcode.com/sota/document-layout-analysis-on-publaynet-val?p=190807836)`

PubLayNet: largest dataset ever for document layout analysis

16 Aug 2019 · Xu Zhong, Jianbin Tang, Antonio Jimeno Yepes ·

Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images. However, document layout datasets that are currently publicly available are several magnitudes smaller than established computing vision datasets. Models have to be trained by transfer learning from a base model that is pre-trained on a traditional computer vision dataset. In this paper, we develop the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. The size of the dataset is comparable to established computer vision datasets, containing over 360 thousand document images, where typical document layout elements are annotated. The experiments demonstrate that deep neural networks trained on PubLayNet accurately recognize the layout of scientific articles. The pre-trained models are also a more effective base mode for transfer learning on a different document domain. We release the dataset (https://github.com/ibm-aur-nlp/PubLayNet) to support development and evaluation of more advanced models for document layout analysis.

PDF Abstract

Code

Add Remove Mark official

ibm-aur-nlp/PubLayNet official

836

ibm-aur-nlp/PubTabNet

366

hpanwar08/detectron2

↳ Quickstart in

Colab

181

phamquiluan/publaynet

172

adlnlp/doc_gcn

↳ Quickstart in

Colab

See all 6 implementations

Tasks

Add Remove

Document Layout Analysis

Transfer Learning

Datasets

Introduced in the Paper:

PubLayNet

Results from the Paper

Edit

Ranked #11 on Document Layout Analysis on PubLayNet val

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Document Layout Analysis	PubLayNet val	Mask RCNN	Text	0.916	# 11	Compare
			Title	0.840	# 11	Compare
			List	0.886	# 11	Compare
			Table	0.960	# 12	Compare
			Figure	0.949	# 11	Compare
			Overall	0.910	# 11	Compare
Document Layout Analysis	PubLayNet val	Faster RCNN	Text	0.910	# 12	Compare
			Title	0.826	# 12	Compare
			List	0.883	# 12	Compare
			Table	0.954	# 13	Compare
			Figure	0.937	# 12	Compare
			Overall	0.902	# 12	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

PubLayNet: largest dataset ever for document layout analysis

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove