PubTabNet

Introduced by Zhong et al. in Image-based table recognition: data, model, and evaluation

PubTabNet is a large dataset for image-based table recognition, containing 568k+ images of tabular data annotated with the corresponding HTML representation of the tables. The table images are extracted from the scientific publications included in the PubMed Central Open Access Subset (commercial use collection). Table regions are identified by matching the PDF format and the XML format of the articles in the PubMed Central Open Access Subset. More details are available in our paper "Image-based table recognition: data, model, and evaluation".

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Table Recognition	PubTabNet	TableMaster

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Table Recognition

Similar Datasets

TabLeX

PubTables-1M

WTW

IIIT-AR-13K

Usage

License

Unknown

PubTabNet

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

TabLeX

PubTables-1M

WTW

IIIT-AR-13K

Usage

License

Modalities

Languages

PubTabNet

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

TabLeX

PubTables-1M

WTW

IIIT-AR-13K

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages