CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

1 Feb 2019  ·  Shikhar Vashishth, Prince Jain, Partha Talukdar ·

Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manuallydefined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) - a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI's effectiveness.

PDF Abstract


  Add Datasets introduced or used in this paper
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Noun Phrase Canonicalization Ambiguous Dataset CESI Macro Precision 66.2 # 1
Micro Precision 92.4 # 1
Pairwise Precision 91.9 # 1
Noun Phrase Canonicalization Base Dataset CESI Macro Precision 98.2 # 2
Micro Precision 99.8 # 2
Pairwise Precision 99.9 # 2
Noun Phrase Canonicalization ReVerb45K CESI Macro Precision 62.7 # 1
Micro Precision 84.4 # 1
Pairwise Precision 81.9 # 1


No methods listed for this paper. Add relevant methods here