Behavioral Testing of Knowledge Graph Embedding Models for Link Prediction

Knowledge graph embedding (KGE) models are often used to encode knowledge graphs in order to predict new links inside the graph. The accuracy of these methods is typically evaluated by computing an averaged accuracy metric on a held-out test set. This approach, however, does not allow the identification of \emph{where} the models might systematically fail or succeed. To address this challenge, we propose a new evaluation framework that builds on the idea of (black-box) behavioral testing, a software engineering principle that enables users to detect system failures before deployment. With behavioral tests, we can specifically target and evaluate the behavior of KGE models on specific capabilities deemed important in the context of a particular use case. To this end, we leverage existing knowledge graph schemas to design behavioral tests for the link prediction task. With an extensive set of experiments, we perform and analyze these tests for several KGE models. Crucially, we for example find that a model ranked second to last on the original test set actually performs best when tested for a specific capability. Such insights allow users to better choose which KGE model might be most suitable for a particular task. The framework is extendable to additional behavioral tests and we hope to inspire fellow researchers to join us in collaboratively growing this framework.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here