Feature Selection: Key to Enhance Node Classification with Graph Neural Networks

Graphs help to define the relationships between entities in the data. These relationships, represented by edges, often provide additional context information which can be utilised to discover patterns in the data. Graph Neural Networks (GNNs) employ the inductive bias of the graph structure to learn and predict on various tasks. The primary operation of graph neural networks is the feature aggregation step performed over neighbours of the node based on the structure of the graph. In addition to its own features, for each hop, the node gets additional combined features from its neighbours. These aggregated features help define the similarity or dissimilarity of the nodes with respect to the labels and are useful for tasks like node classification. However, in real-world data, features of neighbours at different hops may not correlate with the node's features. Thus, any indiscriminate feature aggregation by GNN might cause the addition of noisy features leading to degradation in model's performance. In this work, we show that selective aggregation of node features from various hops leads to better performance than default aggregation on the node classification task. Furthermore, we propose a Dual-Net GNN architecture with a classifier model and a selector model. The classifier model trains over a subset of input node features to predict node labels while the selector model learns to provide optimal input subset to the classifier for the best performance. These two models are trained jointly to learn the best subset of features that give higher accuracy in node label predictions. With extensive experiments, we show that our proposed model outperforms both feature selection methods and state-of-the-art GNN models with remarkable improvements up to 27.8%.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Node Classification arXiv-year Dual-Net GNN Accuracy 62.65±0.39 # 4
Node Classification genius Dual-Net GNN Accuracy 91.45±0.11 # 1
Node Classification Penn94 Dual-Net GNN Accuracy 86.09±0.56 # 1
Node Classification pokec Dual-Net GNN Accuracy 81.55±0.09 # 5
Node Classification snap-patents Dual-Net GNN Accuracy 70.22±0.44 # 3
Node Classification twitch-gamers Dual-Net GNN Accuracy 66.36±0.11 # 1

Methods