Column Type Annotation
13 papers with code • 11 benchmarks • 9 datasets
Column type annotation (CTA) refers to the task of predicting the semantic type of a table column and is a subtask of Table Annotation. The labels that are usually used in a CTA problem are semantic types from vocabularies like DBpedia, Schema.org or WikiData. Some examples are: Book, Country, LocalBusiness etc.
CTA can be either treated as a multi-class classification problem where a column is annotated by only one semantic type or as multi-label classification problem where a column can be annotated using multiple semantic types.
Datasets
Most implemented papers
Sherlock: A Deep Learning Approach to Semantic Data Type Detection
Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery.
ColNet: Embedding the Semantics of Web Tables for Column Type Prediction
Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables.
Learning Semantic Annotations for Tabular Data
The usefulness of tabular data such as web tables critically depends on understanding their semantics.
Sato: Contextual Semantic Type Detection in Tables
Detecting the semantic types of data columns in relational tables is important for various data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search.
TURL: Table Understanding through Representation Learning
In this paper, we present TURL, a novel framework that introduces the pre-training/fine-tuning paradigm to relational Web tables.
Tough Tables: Carefully Evaluating Entity Linking for Tabular Data
Table annotation is a key task to improve querying the Web and support the Knowledge Graph population from legacy sources (tables).
Annotating Columns with Pre-trained Language Models
Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information.
TABBIE: Pretrained Representations of Tabular Data
Existing work on tabular representation learning jointly models tables and associated text using self-supervised objective functions derived from pretrained language models such as BERT.
MAGIC: Mining an Augmented Graph using INK, starting from a CSV
A large portion of structured data does not yet reap the benefits of the Semantic Web.
JenTab Meets SemTab 2021's New Challenges
While tables are a rich source of structured information, their automated use is oftentimes prevented by the inherent ambiguity contained within.