An end-to-end attention-based approach for learning on graphs

16 Feb 2024  ·  David Buterez, Jon Paul Janet, Dino Oglic, Pietro Lio ·

There has been a recent surge in transformer-based architectures for learning on graphs, mainly motivated by attention as an effective learning mechanism and the desire to supersede handcrafted operators characteristic of message passing schemes. However, concerns over their empirical effectiveness, scalability, and complexity of the pre-processing steps have been raised, especially in relation to much simpler graph neural networks that typically perform on par with them across a wide range of benchmarks. To tackle these shortcomings, we consider graphs as sets of edges and propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism. The encoder vertically interleaves masked and vanilla self-attention modules to learn an effective representations of edges, while allowing for tackling possible misspecifications in input graphs. Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks, including challenging long-range benchmarks. Moreover, we demonstrate state-of-the-art performance across different tasks, ranging from molecular to vision graphs, and heterophilous node classification. The approach also outperforms graph neural networks and transformers in transfer learning settings, and scales much better than alternatives with a similar performance level or expressive power.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Graph Classification CIFAR10 100k ESA (Edge set attention, no positional encodings) Accuracy (%) 75.413±0.248 # 6
Graph Classification D&D ESA (Edge set attention, no positional encodings) Accuracy 83.529±1.743 # 3
Graph Classification ENZYMES ESA (Edge set attention, no positional encodings) Accuracy 79.423±1.658 # 1
Molecular Property Prediction ESOL ESA (Edge set attention, no positional encodings) RMSE 0.485±0.009 # 1
R2 0.944±0.002 # 1
Graph Regression ESR2 ESA (Edge set attention, no positional encodings) R2 0.697±0.000 # 1
RMSE 0.486±0.697 # 1
Graph Regression F2 ESA (Edge set attention, no positional encodings) R2 0.891±0.000 # 1
RMSE 0.335±0.891 # 1
Molecular Property Prediction FreeSolv ESA (Edge set attention, no positional encodings) RMSE 0.595±0.013 # 1
R2 0.977±0.001 # 1
Graph Classification IMDb-B ESA (Edge set attention, no positional encodings) Accuracy 86.250±0.957 # 2
Graph Regression KIT ESA (Edge set attention, no positional encodings) R2 0.841±0.000 # 2
RMSE 0.433±0.841 # 2
Graph Regression Lipophilicity ESA (Edge set attention, no positional encodings) RMSE 0.552±0.012 # 5
R2 0.809±0.008 # 5
Graph Classification MalNet-Tiny ESA (Edge set attention, no positional encodings) Accuracy 94.800±0.424 # 1
MCC 0.935±0.005 # 1
Graph Classification MNIST ESA (Edge set attention, no positional encodings, tuned) Accuracy 98.917±0.020 # 1
Graph Classification MNIST ESA (Edge set attention, no positional encodings) Accuracy 98.753±0.041 # 3
Graph Classification NCI1 ESA (Edge set attention, no positional encodings) Accuracy 87.835±0.644 # 2
Graph Classification NCI109 ESA (Edge set attention, no positional encodings) Accuracy 84.976±0.551 # 3
Graph Regression PARP1 ESA (Edge set attention, no positional encodings) R2 0.925±0.000 # 1
RMSE 0.343±0.925 # 1
Graph Regression PCQM4Mv2-LSC ESA (Edge set attention, no positional encodings) Validation MAE 0.0235 # 1
Test MAE N/A # 14
Graph Classification Peptides-func ESA (Edge set attention, no positional encodings, not tuned) AP 0.6863±0.0044 # 19
Graph Classification Peptides-func ESA + RWSE (Edge set attention, Random Walk Structural Encoding, + validation set) AP 0.7479 # 1
Graph Classification Peptides-func ESA + RWSE (Edge set attention, Random Walk Structural Encoding, tuned) AP 0.7357±0.0036 # 3
Graph Classification Peptides-func ESA (Edge set attention, no positional encodings, tuned) AP 0.7071±0.0015 # 13
Graph Regression Peptides-struct ESA (Edge set attention, no positional encodings, not tuned) MAE 0.2453±0.0003 # 7
Graph Regression Peptides-struct ESA + RWSE (Edge set attention, Random Walk Structural Encoding, tuned) MAE 0.2393±0.0004 # 1
Graph Regression PGR ESA (Edge set attention, no positional encodings) R2 0.725±0.000 # 1
RMSE 0.507±0.725 # 1
Graph Classification PROTEINS ESA (Edge set attention, no positional encodings) Accuracy 82.679±0.799 # 4
Graph Regression ZINC ESA + rings + NodeRWSE + EdgeRWSE MAE 0.051 # 1
Graph Regression ZINC-500k ESA + rings + NodeRWSE + EdgeRWSE MAE 0.051 # 1
Graph Regression ZINC-full ESA + RWSE + CY2C (Edge set attention, Random Walk Structural Encoding, clique adjacency, tuned) Test MAE 0.0122±0.0004 # 2
Graph Regression ZINC-full ESA + RWSE (Edge set attention, Random Walk Structural Encoding) Test MAE 0.017±0.001 # 5
Graph Regression ZINC-full ESA (Edge set attention, no positional encodings) Test MAE 0.027±0.001 # 9
Graph Regression ZINC-full ESA + RWSE (Edge set attention, Random Walk Structural Encoding, tuned) Test MAE 0.0154±0.0001 # 4
Graph Regression ZINC-full ESA + rings + NodeRWSE + EdgeRWSE Test MAE 0.0109±0.0002 # 1

Methods