VizNet-Sato

Introduced by Zhang et al. in Sato: Contextual Semantic Type Detection in Tables

VizNet-Sato is a dataset from the authors of Sato and is based on the VizNet dataset. The authors choose from VizNet only relational web tables with headers matching their selected 78 DBpedia semantic types. The selected tables are divided into two categories: Full tables and Multi-column only tables. The first category corresponds to 78,733 selected tables from VizNet, while the second category includes 32,265 tables which have more than one column. The tables of both categories are divided into 5 subsets to be able to conduct 5-fold cross validation: 4 subsets are used for training and the last for evaluation.

The headers of the columns act as semantic annotations for the Column Type Annotation (CTA) task. Some statistics about both categories of tables are provided in the table below, where "Columns" refers to the number of annotated columns and "Classes" to the number of unique DBpedia semantic types used for annotation.

Columns Classes
Full 120,609 78
Multi-column 74,141 78

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages