Edge Proposal Sets for Link Prediction

Graphs are a common model for complex relational data such as social networks and protein interactions, and such data can evolve over time (e.g., new friendships) and be noisy (e.g., unmeasured interactions). Link prediction aims to predict future edges or infer missing edges in the graph, and has diverse applications in recommender systems, experimental design, and complex systems. Even though link prediction algorithms strongly depend on the set of edges in the graph, existing approaches typically do not modify the graph topology to improve performance. Here, we demonstrate how simply adding a set of edges, which we call a \emph{proposal set}, to the graph as a pre-processing step can improve the performance of several link prediction algorithms. The underlying idea is that if the edges in the proposal set generally align with the structure of the graph, link prediction algorithms are further guided towards predicting the right edges; in other words, adding a proposal set of edges is a signal-boosting pre-processing step. We show how to use existing link prediction algorithms to generate effective proposal sets and evaluate this approach on various synthetic and empirical datasets. We find that proposal sets meaningfully improve the accuracy of link prediction algorithms based on both neighborhood heuristics and graph neural networks. Code is available at \url{https://github.com/CUAI/Edge-Proposal-Sets}.

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Link Property Prediction ogbl-collab Adamic Adar+Edge Proposal Set Test Hits@50 0.6548 ± 0.0000 # 12
Validation Hits@50 0.9735 ± 0.0000 # 7
Number of params 0 # 26
Ext. data No # 1
Link Property Prediction ogbl-ddi GraphSAGE+Edge Proposal Set Test Hits@20 0.7495 ± 0.0317 # 15
Validation Hits@20 0.6696 ± 0.0198 # 20
Number of params 1421057 # 19
Ext. data No # 1
Link Property Prediction ogbl-ppa RA+Edge Proposal Set Test Hits@100 0.5324 ± 0.0000 # 8
Validation Hits@100 0.5142 ± 0.0000 # 9
Number of params 0 # 20
Ext. data No # 1


No methods listed for this paper. Add relevant methods here