Pure Transformers are Powerful Graph Learners

We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. Given a graph, we simply treat all nodes and edges as independent tokens, augment them with token embeddings, and feed them to a Transformer. With an appropriate choice of token embeddings, we prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers, which is already more expressive than all message-passing Graph Neural Networks (GNN). When trained on a large-scale graph dataset (PCQM4Mv2), our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results compared to Transformer variants with sophisticated graph-specific inductive bias. Our implementation is available at https://github.com/jw9730/tokengt.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Graph Classification D&D TokenGT Accuracy 73.950±3.361 # 48
Molecular Property Prediction ESOL TokenGT RMSE 0.667±0.103 # 12
R2 0.892±0.036 # 10
Graph Regression ESR2 TokenGT R2 0.641±0.000 # 8
RMSE 0.529±0.641 # 8
Graph Regression F2 TokenGT R2 0.872±0.000 # 8
RMSE 0.363±0.872 # 8
Molecular Property Prediction FreeSolv TokenGT RMSE 1.038±0.125 # 8
R2 0.930±0.018 # 8
Graph Classification IMDb-B TokenGT Accuracy 80.250±3.304 # 6
Graph Regression KIT TokenGT R2 0.800±0.000 # 8
RMSE 0.486±0.800 # 8
Graph Regression Lipophilicity TokenGT RMSE 0.852±0.023 # 16
R2 0.545±0.024 # 10
Graph Classification NCI1 TokenGT Accuracy 76.740±2.054 # 46
Graph Classification NCI109 TokenGT Accuracy 72.077±1.883 # 33
Graph Regression PARP1 TokenGT R2 0.907±0.000 # 8
RMSE 0.383±0.907 # 8
Graph Regression PCQM4Mv2-LSC TokenGT Validation MAE 0.0910 # 17
Test MAE 0.0919 # 10
Graph Regression Peptides-struct TokenGT MAE 0.2489±0.0013 # 21
Graph Regression PGR TokenGT R2 0.684±0.000 # 5
RMSE 0.543±0.684 # 5
Graph Regression ZINC-full TokenGT Test MAE 0.047±0.010 # 13

Methods