Column Type Annotation

16 papers with code • 12 benchmarks • 10 datasets

Column type annotation (CTA) refers to the task of predicting the semantic type of a table column and is a subtask of Table Annotation. The labels that are usually used in a CTA problem are semantic types from vocabularies like DBpedia, or WikiData. Some examples are: Book, Country, LocalBusiness etc.

CTA can be either treated as a multi-class classification problem where a column is annotated by only one semantic type or as multi-label classification problem where a column can be annotated using multiple semantic types.

Most implemented papers

Sherlock: A Deep Learning Approach to Semantic Data Type Detection

mitmedialab/sherlock-project 25 May 2019

Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery.

TABBIE: Pretrained Representations of Tabular Data

SFIG611/tabbie NAACL 2021

Existing work on tabular representation learning jointly models tables and associated text using self-supervised objective functions derived from pretrained language models such as BERT.

ColNet: Embedding the Semantics of Web Tables for Column Type Prediction

alan-turing-institute/SemAIDA 4 Nov 2018

Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables.

Learning Semantic Annotations for Tabular Data

alan-turing-institute/SemAIDA 30 May 2019

The usefulness of tabular data such as web tables critically depends on understanding their semantics.

Sato: Contextual Semantic Type Detection in Tables

megagonlabs/sato 14 Nov 2019

Detecting the semantic types of data columns in relational tables is important for various data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search.

TURL: Table Understanding through Representation Learning

sunlab-osu/TURL 26 Jun 2020

In this paper, we present TURL, a novel framework that introduces the pre-training/fine-tuning paradigm to relational Web tables.

Tough Tables: Carefully Evaluating Entity Linking for Tabular Data

vcutrona/tough-tables International Semantic Web Conference (ISWC) 2020

Table annotation is a key task to improve querying the Web and support the Knowledge Graph population from legacy sources (tables).

Annotating Columns with Pre-trained Language Models

megagonlabs/doduo 5 Apr 2021

Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information.

MAGIC: Mining an Augmented Graph using INK, starting from a CSV

IBCNServices/Magic SemTab@ISWC 2021

A large portion of structured data does not yet reap the benefits of the Semantic Web.

JenTab Meets SemTab 2021's New Challenges

fusion-jena/jentab SemTab@ISWC 2021

While tables are a rich source of structured information, their automated use is oftentimes prevented by the inherent ambiguity contained within.