A benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis. DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv.com.
Source: DocBank: A Benchmark Dataset for Document Layout AnalysisPaper | Code | Results | Date | Stars |
---|