2 code implementations • DI2KG: International Workshop on Challenges and Experiences from Data Integration to Knowledge Graphs @ VLDB 2020 2020 • Ralph Peeters, Christian Bizer, Goran Glavas
Adding the masked language modeling objective in the intermediate training step in order to further adapt the language model to the application domain leads to an additional increase of up to 3% F1.
Ranked #1 on Entity Resolution on WDC Computers-small (using extra training data)
1 code implementation • 4 Feb 2022 • Ralph Peeters, Christian Bizer
We thus conclude that contrastive pre-training has a high potential for product matching use cases in which explicit supervision is available.
Ranked #1 on Entity Resolution on WDC Computers-xlarge
1 code implementation • 5 May 2023 • Ralph Peeters, Christian Bizer
Always using the same set of 10 handpicked demonstrations leads to an improvement of 4. 92% over the zero-shot performance.
1 code implementation • 17 Oct 2023 • Ralph Peeters, Christian Bizer
We show that for use cases that do not allow data to be shared with third parties, open-source LLMs can be a viable alternative to hosted LLMs given that a small amount of training data or matching knowledge...
Ranked #1 on Entity Resolution on Amazon-Google
1 code implementation • Proceedings of the VLDB Endowment 2021 • Ralph Peeters, Christian Bizer
The task can be approached by learning a binary classifier which distinguishes pairs of entity descriptions for the same real-world entity from descriptions of different entities.
Ranked #1 on Entity Resolution on WDC Watches-xlarge
1 code implementation • 19 Oct 2023 • Alexander Brinkmann, Roee Shraga, Christian Bizer
We propose different prompt templates for instructing LLMs about the target schema of the extraction, covering both zero-shot and few-shot scenarios.
1 code implementation • TaDA@VLDB 2023 • Keti Korini, Christian Bizer
Column type annotation is the task of annotating the columns of a relational table with the semantic type of the values contained in each column.
Ranked #2 on Column Type Annotation on WDC SOTAB V2
1 code implementation • EDBT 2017 • Dominique Ritze, Christian Bizer
This paper contributes to improve the understanding of the utility of different features for web table to knowledge base matching by reimplementing different matching techniques as well as similarity score aggregation methods from literature within a single matching framework and evaluating different combinations of these techniques against a single gold standard.
Ranked #1 on Row Annotation on T2Dv2
1 code implementation • International Conference on Information & Knowledge Management 2020 • Anna Primpeli, Christian Bizer
In order to enable the exact reproducibility of evaluation results, matching tasks need to contain exactly defined sets of matching and non-matching record pairs, as well as a fixed development and test split.
Ranked #4 on Entity Resolution on Amazon-Google
1 code implementation • 23 Jan 2023 • Ralph Peeters, Reng Chiz Der, Christian Bizer
It also shows that for entity matching contrastive learning is more training data efficient compared to cross-encoders.
1 code implementation • SemTab@ISWC 2023 • Keti Korini, Ralph Peeters, Christian Bizer
This paper presents the WDC Schema. org Table Annotation Benchmark (SOTAB) for comparing the performance of table annotation systems.
Ranked #1 on Columns Property Annotation on WDC SOTAB
1 code implementation • 6 Mar 2023 • Alexander Brinkmann, Roee Shraga, Christian Bizer
To reduce these runtimes, entity resolution pipelines are constructed of two parts: a blocker that applies a computationally cheap method to select candidate record pairs, and a matcher that afterwards identifies matching pairs from this set using more expensive methods.
Ranked #1 on Blocking on Amazon-Google
1 code implementation • 7 Oct 2021 • Ralph Peeters, Christian Bizer
This poster explores along the use case of matching product offers from different e-shops to which extent it is possible to improve the performance of Transformer-based matchers by complementing a small set of training pairs in the target language, German in our case, with a larger set of English-language training pairs.
1 code implementation • International Semantic Web Conference 2021 • Anna Primpeli, Christian Bizer
ALMSER exploits the rich correspondence graph that exists in multi-source settings for selecting informative record pairs.
Ranked #1 on Entity Resolution on MusicBrainz20K
1 code implementation • 23 Jun 2023 • Alexander Brinkmann, Roee Shraga, Reng Chiz Der, Christian Bizer
Hence, extracting attribute/value pairs from textual product descriptions is an essential enabler for e-commerce applications.
1 code implementation • 4 Mar 2024 • Nick Baumann, Alexander Brinkmann, Christian Bizer
For our experiments, we introduce the WDC Product Attribute-Value Extraction (WDC PAVE) dataset.
no code implementations • LREC 2012 • Pablo Mendes, Joachim Daiber, Rohana Rajapakse, Felix Sasaki, Christian Bizer
In this paper we evaluate the impact of the phrase recognition step on the ability of the system to correctly reproduce the annotations of a gold standard in an unsupervised setting.
no code implementations • LREC 2012 • Pablo Mendes, Max Jakob, Christian Bizer
The DBpedia project extracts structured information from Wikipedia editions in 97 different languages and combines this information into a large multi-lingual knowledge base covering many specific domains and general world knowledge.
no code implementations • LREC 2016 • Julian Seitner, Christian Bizer, Kai Eckert, Stefano Faralli, Robert Meusel, Heiko Paulheim, Simone Paolo Ponzetto
Hypernymy relations (those where an hyponym term shares a {``}isa{''} relationship with his hypernym) play a key role for many Natural Language Processing (NLP) tasks, e. g. ontology learning, automatically building or extending knowledge bases, or word sense disambiguation and induction.