Inductive Representation Learning on Large Graphs

Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.

PDF Abstract NeurIPS 2017 PDF NeurIPS 2017 Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Classification Brazil Air-Traffic GraphSAGE (Hamilton et al., [2017a]) Accuracy 0.404 # 6
Graph Classification CIFAR10 100k GraphSage Accuracy (%) 66.08 # 6
Node Classification CiteSeer (0.5%) GraphSAGE Accuracy 33.8% # 15
Node Classification CiteSeer (1%) GraphSAGE Accuracy 51.0% # 14
Node Classification Citeseer Full-supervised GraphSAGE Accuracy 71.40% # 7
Node Classification CiteSeer with Public Split: fixed 20 nodes per class GraphSAGE Accuracy 67.2% # 30
Node Classification Cora (0.5%) GraphSAGE Accuracy 37.5% # 15
Node Classification Cora (1%) GraphSAGE Accuracy 49.0% # 14
Node Classification Cora (3%) GraphSAGE Accuracy 64.2% # 14
Node Classification Cora Full-supervised GraphSAGE Accuracy 82.20% # 7
Node Classification Cora with Public Split: fixed 20 nodes per class GraphSAGE Accuracy 74.5% # 32
Link Property Prediction ogbl-citation2 NeighborSampling (SAGE aggr) Test MRR 0.8044 ± 0.0010 # 7
Validation MRR 0.8054 ± 0.0009 # 7
Number of params 460289 # 5
Ext. data No # 1
Link Property Prediction ogbl-citation2 Full-batch GraphSAGE Test MRR 0.8260 ± 0.0036 # 6
Validation MRR 0.8263 ± 0.0033 # 6
Number of params 460289 # 5
Ext. data No # 1
Link Property Prediction ogbl-collab GraphSAGE (val as input) Test Hits@50 0.5463 ± 0.0112 # 10
Validation Hits@50 0.5688 ± 0.0077 # 13
Number of params 460289 # 11
Ext. data No # 1
Link Property Prediction ogbl-collab GraphSAGE Test Hits@50 0.4810 ± 0.0081 # 15
Validation Hits@50 0.5688 ± 0.0077 # 13
Number of params 460289 # 11
Ext. data No # 1
Link Property Prediction ogbl-ddi GraphSAGE Test Hits@20 0.5390 ± 0.0474 # 11
Validation Hits@20 0.6262 ± 0.0037 # 11
Number of params 1421057 # 9
Ext. data No # 1
Link Property Prediction ogbl-ppa GraphSAGE Test Hits@100 0.1655 ± 0.0240 # 11
Validation Hits@100 0.1724 ± 0.0264 # 10
Number of params 424449 # 5
Ext. data No # 1
Node Property Prediction ogbn-arxiv GraphSAGE Test Accuracy 0.7149 ± 0.0027 # 52
Validation Accuracy 0.7277 ± 0.0016 # 49
Number of params 218664 # 36
Ext. data No # 1
Node Property Prediction ogbn-mag NeighborSampling (R-GCN aggr) Test Accuracy 0.4678 ± 0.0067 # 15
Validation Accuracy 0.4761 ± 0.0068 # 15
Number of params 154366772 # 5
Ext. data No # 1
Node Property Prediction ogbn-papers100M GraphSAGE_res_incep Test Accuracy 0.6706 ± 0.0017 # 9
Validation Accuracy 0.7032 ± 0.0011 # 10
Number of params 5755172 # 10
Ext. data No # 1
Node Property Prediction ogbn-products GraphSAGE + C&S + node2vec Test Accuracy 0.8154 ± 0.0050 # 23
Validation Accuracy 0.9238 ± 0.0006 # 21
Number of params 103983 # 38
Ext. data No # 1
Node Property Prediction ogbn-products GraphSAGE Test Accuracy 0.7829 ± 0.0016 # 40
Validation Accuracy Please tell us # 45
Number of params Please tell us # 47
Ext. data No # 1
Node Property Prediction ogbn-products NeighborSampling (SAGE aggr) Test Accuracy 0.7870 ± 0.0036 # 38
Validation Accuracy 0.9170 ± 0.0009 # 32
Number of params 206895 # 32
Ext. data No # 1
Node Property Prediction ogbn-products Full-batch GraphSAGE Test Accuracy 0.7850 ± 0.0014 # 39
Validation Accuracy 0.9224 ± 0.0007 # 24
Number of params 206895 # 32
Ext. data No # 1
Node Property Prediction ogbn-proteins GraphSAGE Test ROC-AUC 0.7768 ± 0.0020 # 17
Validation ROC-AUC 0.8334 ± 0.0013 # 15
Number of params 193136 # 17
Ext. data No # 1
Node Classification PATTERN 100k GraphSage Accuracy (%) 50.516 # 10
Node Classification PPI GraphSAGE F1 61.2 # 18
Node Classification PubMed (0.03%) GraphSAGE Accuracy 45.4% # 14
Node Classification PubMed (0.05%) GraphSAGE Accuracy 53.0% # 13
Node Classification PubMed (0.1%) GraphSAGE Accuracy 65.4% # 13
Node Classification Pubmed Full-supervised GraphSAGE Accuracy 87.10% # 7
Node Classification PubMed with Public Split: fixed 20 nodes per class GraphSAGE Accuracy 76.8% # 25
Node Classification Reddit GraphSAGE Accuracy 94.32% # 12
Graph Regression ZINC-500k GraphSage MAE 0.398 # 24

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Node Classification Europe Air-Traffic GraphSAGE (Hamilton et al., [2017a]) Accuracy 27.2 # 6
Node Classification Facebook GraphSAGE (Hamilton et al., [2017a]) Accuracy 38.9 # 5
Node Classification Flickr GraphSAGE (Hamilton et al., [2017a]) Accuracy 0.641 # 3
Node Classification USA Air-Traffic GraphSAGE (Hamilton et al., [2017a]) Accuracy 31.6 # 7
Node Classification Wiki-Vote GraphSAGE (Hamilton et al., [2017a]) Accuracy 24.5 # 6

Methods