Recipe for a General, Powerful, Scalable Graph Transformer

We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art results on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encoding, and what differentiates them. In this paper, we summarize the different types of encodings with a clearer definition and categorize them as being $\textit{local}$, $\textit{global}$ or $\textit{relative}$. Further, GTs remain constrained to small graphs with few hundred nodes, and we propose the first architecture with a complexity linear to the number of nodes and edges $O(N+E)$ by decoupling the local real-edge aggregation from the fully-connected Transformer. We argue that this decoupling does not negatively affect the expressivity, with our architecture being a universal function approximator for graphs. Our GPS recipe consists of choosing 3 main ingredients: (i) positional/structural encoding, (ii) local message-passing mechanism, and (iii) global attention mechanism. We build and open-source a modular framework $\textit{GraphGPS}$ that supports multiple types of encodings and that provides efficiency and scalability both in small and large graphs. We test our architecture on 11 benchmarks and show very competitive results on all of them, show-casing the empirical benefits gained by the modularity and the combination of different strategies.

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Graph Classification CIFAR10 100k GPS Accuracy (%) 72.298 # 3
Node Classification CLUSTER GPS Accuracy 77.95 # 2
Graph Classification MalNet-Tiny GPS Accuracy 93.36 ± 0.6 # 1
Graph Classification MNIST GPS Accuracy 98.05 # 2
Graph Property Prediction ogbg-code2 GPS Test F1 score 0.1894 # 2
Validation F1 score 0.1739 ± 0.001 # 2
Number of params 12454066 # 5
Ext. data No # 1
Graph Property Prediction ogbg-molhiv GPS Test ROC-AUC 0.7880 # 23
Validation ROC-AUC 0.8255 ± 0.0092 # 22
Number of params 558625 # 15
Ext. data No # 1
Graph Property Prediction ogbg-molpcba GPS Test AP 0.2907 # 14
Validation AP 0.3015 ± 0.0038 # 14
Number of params 9744496 # 6
Ext. data No # 1
Graph Property Prediction ogbg-ppa GPS Test Accuracy 0.8015 # 3
Validation Accuracy 0.7556 ± 0.0027 # 3
Number of params 3434533 # 5
Ext. data No # 1
Node Classification PATTERN GPS Accuracy 90.324 # 1
Graph Regression PCQM4Mv2-LSC GraphGPS Validation MAE 0.0858 # 3
Graph Regression ZINC GINE MAE 0.070 ± 0.004 # 1
Graph Regression ZINC GPS MAE 0.070 ± 0.002 # 1
Graph Regression ZINC-500k GPS MAE 0.070 # 2