WDC SOTAB is a benchmark that features two annotation tasks: Column Type Annotation and Columns Property Annotation. The goal of the Column Type Annotation (CTA) task is to annotate the columns of a table with 91 Schema.org types, such as telephone, duration, Place, or Organization. The goal of the Columns Property Annotation (CPA) task is to annotate pairs of table columns with one out of 176 Schema.org properties, such as gtin13, startDate, priceValidUntil, or recipeIngredient. The benchmark consists of 59,548 tables annotated for CTA and 48,379 tables annotated for CPA originating from 74,215 different websites. The tables are split into training-, validation- and test sets for both tasks. The tables cover 17 popular Schema.org types including Product, LocalBusiness, Event, and JobPosting. The tables originate from the Schema.org Table Corpus.
Some characteristics for the different tasks are provided in the table below, where "Columns" refers to the number of columns/column pairs labeled and "Classes" to the number of unique classes used for annotation.
Columns | Classes | |
---|---|---|
Column Property Annotation | 174,998 | 176 |
Column Type Annotation | 162,351 | 91 |
Paper | Code | Results | Date | Stars |
---|