This dataset contains complex tables from the annual reports of S&P 500 companies with detailed table structure annotations to help table structure recognition and table data extraction. The dataset consists of 89,646 pages comprising 112,887 tables with cell structure annotated from IBM Research.

This dataset contains cell structure labels generated through token matching between the PDF and HTML version of each article. Financial tables often have diverse styles when compared to ones in scientific and government documents, with fewer graphical lines and larger gaps within each table and more colour variations. These features are reflected in the dataset.

Source: IBM Developer

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages