Recipe for a General, Powerful, Scalable Graph Transformer

We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art results on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encoding, and what differentiates them. In this paper, we summarize the different types of encodings with a clearer definition and categorize them as being $\textit{local}$, $\textit{global}$ or $\textit{relative}$. The prior GTs are constrained to small graphs with a few hundred nodes, here we propose the first architecture with a complexity linear in the number of nodes and edges $O(N+E)$ by decoupling the local real-edge aggregation from the fully-connected Transformer. We argue that this decoupling does not negatively affect the expressivity, with our architecture being a universal function approximator on graphs. Our GPS recipe consists of choosing 3 main ingredients: (i) positional/structural encoding, (ii) local message-passing mechanism, and (iii) global attention mechanism. We provide a modular framework $\textit{GraphGPS}$ that supports multiple types of encodings and that provides efficiency and scalability both in small and large graphs. We test our architecture on 16 benchmarks and show highly competitive results in all of them, show-casing the empirical benefits gained by the modularity and the combination of different strategies.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Graph Classification CIFAR10 100k GPS Accuracy (%) 72.298 # 6
Node Classification CLUSTER GPS Accuracy 77.95 # 6
Node Classification COCO-SP GPS macro F1 0.3412±0.0044 # 3
Graph Classification MalNet-Tiny GPS Accuracy 93.36 ± 0.6 # 2
Graph Classification MNIST GPS Accuracy 98.05 # 5
Graph Property Prediction ogbg-code2 GPS Test F1 score 0.1894 # 5
Validation F1 score 0.1739 ± 0.001 # 5
Number of params 12454066 # 9
Ext. data No # 1
Graph Property Prediction ogbg-molhiv GPS Test ROC-AUC 0.7880 # 26
Validation ROC-AUC 0.8255 ± 0.0092 # 24
Number of params 558625 # 17
Ext. data No # 1
Graph Property Prediction ogbg-molpcba GPS Test AP 0.2907 # 18
Validation AP 0.3015 ± 0.0038 # 17
Number of params 9744496 # 9
Ext. data No # 1
Graph Property Prediction ogbg-ppa GPS Test Accuracy 0.8015 # 3
Validation Accuracy 0.7556 ± 0.0027 # 3
Number of params 3434533 # 5
Ext. data No # 1
Node Classification PascalVOC-SP GPS macro F1 0.3748±0.0109 # 4
Node Classification PATTERN GPS Accuracy 86.685 # 5
Graph Regression PCQM4Mv2-LSC GPS Validation MAE 0.0852 # 8
Test MAE 0.0862 # 7
Link Prediction PCQM-Contact GPS MRR 0.3337±0.0006 # 11
Graph Classification Peptides-func GPS AP 0.6535±0.0041 # 15
Graph Regression Peptides-struct GPS MAE 0.2500±0.0005 # 12
Graph Regression ZINC GINE MAE 0.070 ± 0.004 # 5
Graph Regression ZINC GPS MAE 0.070 ± 0.002 # 5
Graph Regression ZINC-500k GPS MAE 0.070 # 6

Methods