Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

15 Sep 2020  ·  Costas Mavromatis, George Karypis ·

Unsupervised (or self-supervised) graph representation learning is essential to facilitate various graph data mining tasks when external supervision is unavailable. The challenge is to encode the information about the graph structure and the attributes associated with the nodes and edges into a low dimensional space. Most existing unsupervised methods promote similar representations across nodes that are topologically close. Recently, it was shown that leveraging additional graph-level information, e.g., information that is shared among all nodes, encourages the representations to be mindful of the global properties of the graph, which greatly improves their quality. However, in most graphs, there is significantly more structure that can be captured, e.g., nodes tend to belong to (multiple) clusters that represent structurally similar nodes. Motivated by this observation, we propose a graph representation learning method called Graph InfoClust (GIC), that seeks to additionally capture cluster-level information content. These clusters are computed by a differentiable K-means method and are jointly optimized by maximizing the mutual information between nodes of the same clusters. This optimization leads the node representations to capture richer information and nodal interactions, which improves their quality. Experiments show that GIC outperforms state-of-art methods in various downstream tasks (node classification, link prediction, and node clustering) with a 0.9% to 6.1% gain over the best competing approach, on average.

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Classification AMZ Comp Graph InfoClust (GIC) Accuracy 81.5 ± 1.0 # 3
Node Classification AMZ Photo Graph InfoClust (GIC) Accuracy 90.4 ± 1.0 # 6
Node Classification Citeseer Graph InfoClust (GIC) Accuracy 71.9 ± 1.4 # 42
Link Prediction Citeseer Graph InfoClust (GIC) AUC 97 # 1
AP 96.8 # 3
Node Clustering Citeseer Graph InfoClust (GIC) Accuracy 69.6 # 2
NMI 45.3 # 1
ARI 46.5 # 2
Node Classification Coauthor CS Graph InfoClust (GIC) Accuracy 89.4 ± 0.4 # 11
Node Classification Coauthor Phy Graph InfoClust (GIC) Accuracy 93.1 ± 0.7 # 2
Link Prediction Cora sGraphite-VAE AUC 93.7% # 6
AP 93.5% # 6
Node Clustering Cora Graph InfoClust (GIC) Accuracy 72.5 # 4
NMI 53.7 # 7
ARI 50.8 # 3
Node Classification Cora: fixed 20 node per class Graph InfoClust (GIC) Accuracy 81.7 ± 1.5 # 5
Node Clustering Pubmed Graph InfoClust (GIC) Accuracy 67.3 # 5
NMI 31.9 # 5
ARI 29.1 # 3
Link Prediction Pubmed Graph InfoClust (GIC) AUC 93.7% # 8
AP 93.5% # 8
Node Classification Pubmed Graph InfoClust (GIC) Accuracy 77.4 ± 1.9 # 53