The WikipediaGS dataset was created by extracting Wikipedia tables from Wikipedia pages. It consists of 485,096 tables which were annotated with DBpedia entities for the Cell Entity Annotation (CEA) task.

Additionally, a subset of these tables was annotated by Chen et al. for the Column Type Annotation (CTA) task and includes 604 tables, where selected columns were annotated using DBpedia types. This subset is available for download at their official Github repository.

The table below shows the number of annotated cells/columns for each task and the number of different classes used for the annotation.

Annotations Classes
CEA 4,453,329 1,222,358
CTA 620 31


