The ToughTables (2T) dataset was created for the SemTab challenge and includes 180 tables in total. The tables in this dataset can be categorized in two groups: the control (CTRL) group tables and tough (TOUGH) group tables.
11 PAPERS • 4 BENCHMARKS
The BioDiv dataset includes manually labeled tables for CTA and CEA from the biodiversity domain.
8 PAPERS • 2 BENCHMARKS
The WikiTables-TURL dataset was constructed by the authors of TURL and is based on the WikiTable corpus, which is a large collection of Wikipedia tables. The dataset consists of 580,171 tables divided into fixed training, validation and testing splits. Additionally, the dataset contains metadata about each table, such as the table name, table caption and column headers.
4 PAPERS • 3 BENCHMARKS
The WikipediaGS dataset was created by extracting Wikipedia tables from Wikipedia pages. It consists of 485,096 tables which were annotated with DBpedia entities for the Cell Entity Annotation (CEA) task.
3 PAPERS • 2 BENCHMARKS