Global Attention Improves Graph Networks Generalization

14 Jun 2020  ·  Omri Puny, Heli Ben-Hamu, Yaron Lipman ·

This paper advocates incorporating a Low-Rank Global Attention (LRGA) module, a computation and memory efficient variant of the dot-product attention (Vaswani et al., 2017), to Graph Neural Networks (GNNs) for improving their generalization power. To theoretically quantify the generalization properties granted by adding the LRGA module to GNNs, we focus on a specific family of expressive GNNs and show that augmenting it with LRGA provides algorithmic alignment to a powerful graph isomorphism test, namely the 2-Folklore Weisfeiler-Lehman (2-FWL) algorithm. In more detail we: (i) consider the recent Random Graph Neural Network (RGNN) (Sato et al., 2020) framework and prove that it is universal in probability; (ii) show that RGNN augmented with LRGA aligns with 2-FWL update step via polynomial kernels; and (iii) bound the sample complexity of the kernel's feature map when learned with a randomly initialized two-layer MLP. From a practical point of view, augmenting existing GNN layers with LRGA produces state of the art results in current GNN benchmarks. Lastly, we observe that augmenting various GNN architectures with LRGA often closes the performance gap between different models.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Link Property Prediction ogbl-collab PLNLP+ LRGA Test Hits@50 0.6909 ± 0.0055 # 7
Validation Hits@50 1.0000 ± 0.0000 # 1
Number of params 35200656 # 5
Ext. data No # 1
Link Property Prediction ogbl-collab LRGA + GCN Test Hits@50 0.5221 ± 0.0072 # 21
Validation Hits@50 0.6088 ± 0.0059 # 19
Number of params 1069489 # 14
Ext. data No # 1
Link Property Prediction ogbl-ddi LRGA + GCN Test Hits@20 0.6230 ± 0.0912 # 18
Validation Hits@20 0.6675 ± 0.0058 # 21
Number of params 1576081 # 15
Ext. data No # 1

Methods