Column Type Annotation

15 papers with code • 12 benchmarks • 10 datasets

Column type annotation (CTA) refers to the task of predicting the semantic type of a table column and is a subtask of Table Annotation. The labels that are usually used in a CTA problem are semantic types from vocabularies like DBpedia, Schema.org or WikiData. Some examples are: Book, Country, LocalBusiness etc.

CTA can be either treated as a multi-class classification problem where a column is annotated by only one semantic type or as multi-label classification problem where a column can be annotated using multiple semantic types.

Benchmarks

Add a Result

These leaderboards are used to track progress in Column Type Annotation

Dataset	Best Model	Compare
ToughTables-DBP	KGCODE-Tab	See all
BiodivTab	KGCODE-Tab	See all
ToughTables-WD	DAGOBAH	See all
WDC SOTAB V2	TorchicTab	See all
VizNet-Sato-Full	Watchog	See all
WikiTables-TURL-CTA	TURL	See all
GitTables-SemTab-DBP	KGCODE-Tab	See all
T2Dv2	HNN + P2Vec	See all
WDC SOTAB	DODUO	See all
VizNet-Sato-MultiColumn	DODUO	See all
GitTables-SemTab-SCH	KGCODE-Tab	See all
WikipediaGS-CTA	TURL	See all

Show all 12 benchmarks

Collapse benchmarks

Datasets

Most implemented papers

Most implemented Social Latest No code

Sherlock: A Deep Learning Approach to Semantic Data Type Detection

mitmedialab/sherlock-project • • 25 May 2019

Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery.

Paper
Code

TABBIE: Pretrained Representations of Tabular Data

SFIG611/tabbie • • NAACL 2021

Existing work on tabular representation learning jointly models tables and associated text using self-supervised objective functions derived from pretrained language models such as BERT.

Paper
Code

ColNet: Embedding the Semantics of Web Tables for Column Type Prediction

alan-turing-institute/SemAIDA • 4 Nov 2018

Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables.

Paper
Code

Learning Semantic Annotations for Tabular Data

alan-turing-institute/SemAIDA • 30 May 2019

The usefulness of tabular data such as web tables critically depends on understanding their semantics.

Paper
Code

Sato: Contextual Semantic Type Detection in Tables

megagonlabs/sato • • 14 Nov 2019

Detecting the semantic types of data columns in relational tables is important for various data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search.

Paper
Code

TURL: Table Understanding through Representation Learning

sunlab-osu/TURL • • 26 Jun 2020

In this paper, we present TURL, a novel framework that introduces the pre-training/fine-tuning paradigm to relational Web tables.

Paper
Code

Tough Tables: Carefully Evaluating Entity Linking for Tabular Data

vcutrona/tough-tables • International Semantic Web Conference (ISWC) 2020

Table annotation is a key task to improve querying the Web and support the Knowledge Graph population from legacy sources (tables).

Paper
Code

Annotating Columns with Pre-trained Language Models

megagonlabs/doduo • • 5 Apr 2021

Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information.

Paper
Code