WDC SOTAB

WDC SOTAB is a benchmark that features two annotation tasks: Column Type Annotation and Columns Property Annotation. The goal of the Column Type Annotation (CTA) task is to annotate the columns of a table with 91 Schema.org types, such as telephone, duration, Place, or Organization. The goal of the Columns Property Annotation (CPA) task is to annotate pairs of table columns with one out of 176 Schema.org properties, such as gtin13, startDate, priceValidUntil, or recipeIngredient. The benchmark consists of 59,548 tables annotated for CTA and 48,379 tables annotated for CPA originating from 74,215 different websites. The tables are split into training-, validation- and test sets for both tasks. The tables cover 17 popular Schema.org types including Product, LocalBusiness, Event, and JobPosting. The tables originate from the Schema.org Table Corpus.

Some characteristics for the different tasks are provided in the table below, where "Columns" refers to the number of columns/column pairs labeled and "Classes" to the number of unique classes used for annotation.

Columns Classes
Column Property Annotation 174,998 176
Column Type Annotation 162,351 91

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages