Column Type Annotation
15 papers with code • 12 benchmarks • 10 datasets
Column type annotation (CTA) refers to the task of predicting the semantic type of a table column and is a subtask of Table Annotation. The labels that are usually used in a CTA problem are semantic types from vocabularies like DBpedia, Schema.org or WikiData. Some examples are: Book, Country, LocalBusiness etc.
CTA can be either treated as a multi-class classification problem where a column is annotated by only one semantic type or as multi-label classification problem where a column can be annotated using multiple semantic types.
Datasets
Most implemented papers
Towards an Approach based on Knowledge Graph Refinement for Tabular Data to Knowledge Graph Matching
This paper presents our contribution to the Accuracy Track of Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab).
SOTAB: The WDC Schema.org Table Annotation Benchmark
This paper presents the WDC Schema. org Table Annotation Benchmark (SOTAB) for comparing the performance of table annotation systems.
Column Type Annotation using ChatGPT
Column type annotation is the task of annotating the columns of a relational table with the semantic type of the values contained in each column.
CHORUS: Foundation Models for Unified Data Discovery and Exploration
On all three tasks, we show that a foundation-model-based approach outperforms the task-specific models and so the state of the art.
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models
We introduce ArcheType, a simple, practical method for context sampling, prompt serialization, model querying, and label remapping, which enables large language models to solve CTA problems in a fully zero-shot manner.