Rethinking Graph Auto-Encoder Models for Attributed Graph Clustering

19 Jul 2021  ยท  Nairouz Mrabah, Mohamed Bouguessa, Mohamed Fawzi Touati, Riadh Ksantini ยท

Most recent graph clustering methods have resorted to Graph Auto-Encoders (GAEs) to perform joint clustering and embedding learning. However, two critical issues have been overlooked. First, the accumulative error, inflicted by learning with noisy clustering assignments, degrades the effectiveness and robustness of the clustering model. This problem is called Feature Randomness. Second, reconstructing the adjacency matrix sets the model to learn irrelevant similarities for the clustering task. This problem is called Feature Drift. Interestingly, the theoretical relation between the aforementioned problems has not yet been investigated. We study these issues from two aspects: (1) there is a trade-off between Feature Randomness and Feature Drift when clustering and reconstruction are performed at the same level, and (2) the problem of Feature Drift is more pronounced for GAE models, compared with vanilla auto-encoder models, due to the graph convolutional operation and the graph decoding design. Motivated by these findings, we reformulate the GAE-based clustering methodology. Our solution is two-fold. First, we propose a sampling operator $\Xi$ that triggers a protection mechanism against the noisy clustering assignments. Second, we propose an operator $\Upsilon$ that triggers a correction mechanism against Feature Drift by gradually transforming the reconstructed graph into a clustering-oriented one. As principal advantages, our solution grants a considerable improvement in clustering effectiveness and robustness and can be easily tailored to existing GAE models.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Clustering Citeseer R-DGAE Accuracy 70.5 # 1
NMI 45.0 # 2
ARI 47.1 # 1
Graph Clustering Citeseer R-GMM-VGAE ARI 43.9 # 2
NMI 42.0 # 2
ACC 68.9 # 2
Graph Clustering Citeseer R-DGAE ARI 47.1 # 1
NMI 45.0 # 1
ACC 70.5 # 1
Node Clustering Citeseer R-GMM-VGAE Accuracy 68.9 # 4
NMI 42.0 # 5
ARI 43.9 # 3
Node Clustering Cora R-DGAE Accuracy 73.7 # 3
NMI 56.0 # 4
ARI 54.1 # 2
Node Clustering Cora R-GMM-VGAE Accuracy 76.7 # 1
NMI 57.3 # 2
ARI 57.9 # 1
Graph Clustering Cora R-GMM-VGAE ARI 57.9 # 1
NMI 57.3 # 1
ACC 76.7 # 1
Graph Clustering Cora R-DGAE ARI 54.1 # 2
NMI 56.0 # 2
ACC 73.7 # 2
Graph Clustering Pubmed R-DGAE NMI 34.4 # 3
ARI 34.6 # 2
ACC 71.4 # 4
Node Clustering Pubmed R-DGAE Accuracy 71.4 # 3
NMI 34.4 # 2
ARI 34.6 # 2
Node Clustering Pubmed R-GMM-VGAE Accuracy 74.0 # 1
NMI 33.4 # 3
ARI 37.9 # 1
Graph Clustering Pubmed R-GMM-VGAE NMI 33.4 # 4
ARI 37.9 # 1
ACC 74.0 # 1

Methods


No methods listed for this paper. Add relevant methods here