JenTab Meets SemTab 2021's New Challenges
While tables are a rich source of structured information, their automated use is oftentimes prevented by the inherent ambiguity contained within. Issues ranging from mere typos over inconsistent naming conventions to homonymy among values pose substantial barriers to exploiting this source of knowledge. Although the Semantic Web can alleviate many of these issues, the actual annotation process remains challenging. To foster new ideas and the improvement of existing approaches, the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) since 2019 hosts yearly competitions allowing systems to present their current capabilities. Datasets of different origins and characteristics highlight the various challenges present in this area. In this paper, we report on the evolution of our system, “JenTab”, during SemTab2021. We re-designed the system architecture, optimized individual modules, and developed various pipelines to target specific challenges posed throughout the challenge. JenTab is among the top 5 systems in the first two rounds of SemTab2021. The results demonstrate JenTab’s flexibility and its ability to quickly address new challenges.
PDF AbstractCode
Datasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Cell Entity Annotation | BiodivTab | JenTab | F1 (%) | 60.2 | # 4 | |
Column Type Annotation | BiodivTab | JenTab | F1 (%) | 10.7 | # 6 | |
Column Type Annotation | ToughTables-DBP | JenTab | F1 (%) | 46 | # 2 | |
Cell Entity Annotation | ToughTables-DBP | JenTab | F1 (%) | 60.7 | # 3 | |
Cell Entity Annotation | ToughTables-WD | JenTab | F1 (%) | 45.7 | # 5 | |
Column Type Annotation | ToughTables-WD | JenTab | F1 (%) | 69.7 | # 3 |