Edge-augmented Graph Transformers: Global Self-attention is Enough for Graphs

7 Aug 2021  ·  Md Shamim Hussain, Mohammed J. Zaki, Dharmashankar Subramanian ·

Transformer neural networks have achieved state-of-the-art results for unstructured data such as text and images but their adoption for graph-structured data has been limited. This is partly due to the difficulty of incorporating complex structural information in the basic transformer framework. We propose a simple yet powerful extension to the transformer - residual edge channels. The resultant framework, which we call Edge-augmented Graph Transformer (EGT), can directly accept, process and output structural information as well as node information. It allows us to use global self-attention, the key element of transformers, directly for graphs and comes with the benefit of long-range interaction among nodes. Moreover, the edge channels allow the structural information to evolve from layer to layer, and prediction tasks on edges/links can be performed directly from the output embeddings of these channels. In addition, we introduce a generalized positional encoding scheme for graphs based on Singular Value Decomposition which can improve the performance of EGT. Our framework, which relies on global node feature aggregation, achieves better performance compared to Convolutional/Message-Passing Graph Neural Networks, which rely on local feature aggregation within a neighborhood. We verify the performance of EGT in a supervised learning setting on a wide range of experiments on benchmark datasets. Our findings indicate that convolutional aggregation is not an essential inductive bias for graphs and global self-attention can serve as a flexible and adaptive alternative.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Graph Classification CIFAR10 100k EGT Accuracy (%) 67.004 # 5
Node Classification CLUSTER EGT Accuracy 77.909 ± 0.245 # 1
Graph Classification MNIST EGT Accuracy 97.722 # 2
Graph Classification MNIST EGT-Simple Accuracy 97.94 # 1
Node Classification PATTERN EGT Accuracy 86.856 ± 0.013 # 1
Node Classification PATTERN 100k EGT Accuracy (%) 86.827 ± 0.037 # 1
Graph Regression PCQM4M-LSC EGT Validation MAE 0.1263 # 2
Link Prediction TSP/HCP Benchmark set EGT F1 84.5 # 1
Graph Regression ZINC EGT MAE 0.154 # 6
Graph Regression ZINC 100k EGT MAE 0.171 # 2
Graph Regression ZINC-500k EGT MAE 0.154 # 13

Methods