Column Type Annotation

15 papers with code • 12 benchmarks • 10 datasets

Column type annotation (CTA) refers to the task of predicting the semantic type of a table column and is a subtask of Table Annotation. The labels that are usually used in a CTA problem are semantic types from vocabularies like DBpedia, Schema.org or WikiData. Some examples are: Book, Country, LocalBusiness etc.

CTA can be either treated as a multi-class classification problem where a column is annotated by only one semantic type or as multi-label classification problem where a column can be annotated using multiple semantic types.

Most implemented papers

Towards an Approach based on Knowledge Graph Refinement for Tabular Data to Knowledge Graph Matching

jiofidelus/tsotsa SemTab@ISWC 2022

This paper presents our contribution to the Accuracy Track of Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab).

SOTAB: The WDC Schema.org Table Annotation Benchmark

wbsg-uni-mannheim/wdc-sotab SemTab@ISWC 2023

This paper presents the WDC Schema. org Table Annotation Benchmark (SOTAB) for comparing the performance of table annotation systems.

Column Type Annotation using ChatGPT

wbsg-uni-mannheim/tabanngpt TaDA@VLDB 2023

Column type annotation is the task of annotating the columns of a relational table with the semantic type of the values contained in each column.

CHORUS: Foundation Models for Unified Data Discovery and Exploration

mkyl/chorus 16 Jun 2023

On all three tasks, we show that a foundation-model-based approach outperforms the task-specific models and so the state of the art.

ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models

penfever/archetype 27 Oct 2023

We introduce ArcheType, a simple, practical method for context sampling, prompt serialization, model querying, and label remapping, which enables large language models to solve CTA problems in a fully zero-shot manner.