JenTab Meets SemTab 2021's New Challenges

SemTab@ISWC 2021  ·  Nora Abdelmageed, Sirko Schindler ·

While tables are a rich source of structured information, their automated use is oftentimes prevented by the inherent ambiguity contained within. Issues ranging from mere typos over inconsistent naming conventions to homonymy among values pose substantial barriers to exploiting this source of knowledge. Although the Semantic Web can alleviate many of these issues, the actual annotation process remains challenging. To foster new ideas and the improvement of existing approaches, the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) since 2019 hosts yearly competitions allowing systems to present their current capabilities. Datasets of different origins and characteristics highlight the various challenges present in this area. In this paper, we report on the evolution of our system, “JenTab”, during SemTab2021. We re-designed the system architecture, optimized individual modules, and developed various pipelines to target specific challenges posed throughout the challenge. JenTab is among the top 5 systems in the first two rounds of SemTab2021. The results demonstrate JenTab’s flexibility and its ability to quickly address new challenges.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Cell Entity Annotation BiodivTab JenTab F1 (%) 60.2 # 4
Column Type Annotation BiodivTab JenTab F1 (%) 10.7 # 6
Column Type Annotation ToughTables-DBP JenTab F1 (%) 46 # 2
Cell Entity Annotation ToughTables-DBP JenTab F1 (%) 60.7 # 3
Cell Entity Annotation ToughTables-WD JenTab F1 (%) 45.7 # 5
Column Type Annotation ToughTables-WD JenTab F1 (%) 69.7 # 3

Methods


No methods listed for this paper. Add relevant methods here