Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction

Learning on graphs has attracted significant attention in the learning community due to numerous real-world applications. In particular, graph neural networks (GNNs), which take numerical node features and graph structure as inputs, have been shown to achieve state-of-the-art performance on various graph-related learning tasks. Recent works exploring the correlation between numerical node features and graph structure via self-supervised learning have paved the way for further performance improvements of GNNs. However, methods used for extracting numerical node features from raw data are still graph-agnostic within standard GNN pipelines. This practice is sub-optimal as it prevents one from fully utilizing potential correlations between graph topology and node attributes. To mitigate this issue, we propose a new self-supervised learning framework, Graph Information Aided Node feature exTraction (GIANT). GIANT makes use of the eXtreme Multi-label Classification (XMC) formalism, which is crucial for fine-tuning the language model based on graph information, and scales to large datasets. We also provide a theoretical analysis that justifies the use of XMC over link prediction and motivates integrating XR-Transformers, a powerful method for solving XMC problems, into the GIANT framework. We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets: For example, we improve the accuracy of the top-ranked method GAMLP from $68.25\%$ to $69.67\%$, SGC from $63.29\%$ to $66.10\%$ and MLP from $47.24\%$ to $61.10\%$ on the ogbn-papers100M dataset by leveraging GIANT.

PDF Abstract ICLR 2022 PDF ICLR 2022 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Property Prediction ogbn-arxiv GIANT-XRT+RevGAT+KD (use raw text) Test Accuracy 0.7615 ± 0.0010 # 12
Validation Accuracy 0.7716 ± 0.0009 # 10
Number of params 1304912 # 39
Ext. data Yes # 1
Node Property Prediction ogbn-arxiv GIANT-XRT+MLP (use raw text) Test Accuracy 0.7306 ± 0.0011 # 47
Validation Accuracy 0.7432 ± 0.0009 # 48
Number of params 273960 # 52
Ext. data Yes # 1
Node Property Prediction ogbn-arxiv GIANT-XRT+GraphSAGE (use raw text) Test Accuracy 0.7435 ± 0.0014 # 18
Validation Accuracy 0.7595 ± 0.0011 # 18
Number of params 546344 # 48
Ext. data Yes # 1
Node Property Prediction ogbn-papers100M GIANT-XRT+GAMLP+RLU (use raw text) Test Accuracy 0.6967 ± 0.0005 # 2
Validation Accuracy 0.7305 ± 0.0004 # 2
Number of params 21551631 # 6
Ext. data Yes # 1
Node Property Prediction ogbn-products GIANT-XRT+SAGN+SLE (use raw text) Test Accuracy 0.8622 ± 0.0022 # 13
Validation Accuracy 0.9363 ± 0.0005 # 11
Number of params 1548382 # 22
Ext. data Yes # 1
Node Property Prediction ogbn-products GIANT-XRT+SAGN+SLE+C&S (use raw text) Test Accuracy 0.8643 ± 0.0020 # 12
Validation Accuracy 0.9352 ± 0.0005 # 13
Number of params 1548382 # 22
Ext. data Yes # 1
Node Property Prediction ogbn-products GIANT-XRT+MLP (use raw text) Test Accuracy 0.8049 ± 0.0028 # 42
Validation Accuracy 0.9210 ± 0.0009 # 40
Number of params 275759 # 41
Ext. data Yes # 1
Node Property Prediction ogbn-products GIANT-XRT+GraphSAINT(use raw text) Test Accuracy 0.8415 ± 0.0022 # 25
Validation Accuracy 0.9318 ± 0.0004 # 17
Number of params 417583 # 38
Ext. data Yes # 1

Methods


No methods listed for this paper. Add relevant methods here