WikiTables-TURL Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

The WikiTables-TURL dataset was constructed by the authors of [TURL](https://paperswithcode.com/paper/turl-table-understanding-through) and is based on the WikiTable corpus, which is a large collection of Wikipedia tables. The dataset consists of 580,171 tables divided into fixed training, validation and testing splits. Additionally, the dataset contains metadata about each table, such as the table name, table caption  and column headers.

406,706 of these tables are annotated for the Column Type Annotation (CTA) task, 55,970 tables for the Columns Property Annotation (CPA) task and 200,744 tables for the Cell Entity Annotation (CEA) task. As classes for the CTA and CPA, Freebase's types and relations were used, whereas for the CEA task entities from Freebase were used. The table below lists the total annotated columns (or cells in the case of CEA) for each split and for each task  as well as the number of classes used for annotation.

|     | Training | Validation | Testing | Classes |
|-----|--------|----------|-------|-------|
| CTA | 628,254 |13,391| 13,025 | 255 |
| CPA | 62,954| 2,175 | 2,072 | 121 |
| CEA | 1,264,217 | 76,720  | 225,777 | 1,787,737 |

The authors have made the dataset and its variants publicly available for [download](https://buckeyemailosu-my.sharepoint.com/personal/deng_595_buckeyemail_osu_edu/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fdeng%5F595%5Fbuckeyemail%5Fosu%5Fedu%2FDocuments%2FBuckeyeBox%20Data%2FTURL&ga=1).

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

WikiTables-TURL

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

WDC SOTAB

T2Dv2

WikipediaGS

VizNet-Sato

Usage

License

Modalities

Languages

	Training	Validation	Testing	Classes
CTA	628,254	13,391	13,025	255
CPA	62,954	2,175	2,072	121
CEA	1,264,217	76,720	225,777	1,787,737

Task	Dataset Variant	Best Model
Column Type Annotation	WikiTables-TURL-CTA	TURL
Columns Property Annotation	WikiTables-TURL-CPA	TURL
Cell Entity Annotation	WikiTables-TURL-CEA	TURL

WikiTables-TURL

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit