Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Many widely used datasets for graph machine learning tasks have generally been homophilous, where nodes with similar labels connect to each other. Recently, new Graph Neural Networks (GNNs) have been developed that move beyond the homophily regime; however, their evaluation has often been conducted on small graphs with limited application domains. We collect and introduce diverse non-homophilous datasets from a variety of application areas that have up to 384x more nodes and 1398x more edges than prior datasets. We further show that existing scalable graph learning and graph minibatching techniques lead to performance degradation on these non-homophilous datasets, thus highlighting the need for further work on scalable non-homophilous methods. To address these concerns, we introduce LINKX -- a strong simple method that admits straightforward minibatch training and inference. Extensive experimental results with representative simple methods and GNNs across our proposed datasets show that LINKX achieves state-of-the-art performance for learning on non-homophilous graphs. Our codes and data are available at https://github.com/CUAI/Non-Homophily-Large-Scale.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Classification Actor LINKX Accuracy 36.10 ± 1.55 # 29
Node Classification arXiv-year LINKX Accuracy 56.00±1.34 # 5
Node Classification Chameleon LINKX Accuracy 68.42 ± 1.38 # 32
Node Classification on Non-Homophilic (Heterophilic) Graphs Chameleon (48%/32%/20% fixed splits) LINKX 1:1 Accuracy 68.42 ± 1.38  # 15
Node Classification Citeseer (48%/32%/20% fixed splits) LINKX 1:1 Accuracy 73.19 ± 0.99 # 25
Node Classification Cora (48%/32%/20% fixed splits) LINKX 1:1 Accuracy 84.64 ± 1.13 # 24
Node Classification Cornell LINKX Accuracy 77.84 ± 5.81 # 39
Node Classification on Non-Homophilic (Heterophilic) Graphs Cornell (48%/32%/20% fixed splits) LINKX 1:1 Accuracy  77.84 ± 5.81  # 22
Node Classification on Non-Homophilic (Heterophilic) Graphs Film(48%/32%/20% fixed splits) LINKX 1:1 Accuracy 36.10 ± 1.55  # 17
Node Classification on Non-Homophilic (Heterophilic) Graphs genius GATJK 1:1 Accuracy 56.70 ± 2.07 # 26
Node Classification on Non-Homophilic (Heterophilic) Graphs genius GCNJK 1:1 Accuracy 89.30 ± 0.19 # 15
Node Classification on Non-Homophilic (Heterophilic) Graphs genius LINK  1:1 Accuracy 73.56 ± 0.14 # 23
Node Classification on Non-Homophilic (Heterophilic) Graphs genius L Prop 2-hop 1:1 Accuracy 67.04 ± 0.20 # 24
Node Classification on Non-Homophilic (Heterophilic) Graphs genius L Prop 1-hop 1:1 Accuracy 66.02 ± 0.16 # 25
Node Classification on Non-Homophilic (Heterophilic) Graphs genius MLP 1:1 Accuracy 86.68 ± 0.09 # 17
Node Classification on Non-Homophilic (Heterophilic) Graphs genius LINKX 1:1 Accuracy 90.77 ± 0.27 # 10
Node Classification genius LINKX Accuracy 90.77 ± 0.27 # 8
Node Classification on Non-Homophilic (Heterophilic) Graphs Penn94 GATJK 1:1 Accuracy 80.69 ± 0.36 # 18
Node Classification on Non-Homophilic (Heterophilic) Graphs Penn94 GCNJK 1:1 Accuracy 81.63 ± 0.54 # 13
Node Classification on Non-Homophilic (Heterophilic) Graphs Penn94 LINK  1:1 Accuracy 80.79 ± 0.49 # 17
Node Classification on Non-Homophilic (Heterophilic) Graphs Penn94 L Prop 2-hop 1:1 Accuracy 74.13 ± 0.46 # 25
Node Classification on Non-Homophilic (Heterophilic) Graphs Penn94 L Prop 1-hop 1:1 Accuracy 63.21 ± 0.39 # 28
Node Classification Penn94 LINKX Accuracy 84.71 ± 0.52 # 9
Node Classification on Non-Homophilic (Heterophilic) Graphs Penn94 LINKX 1:1 Accuracy 84.71 ± 0.52 # 7
Node Classification pokec LINKX Accuracy 82.04±0.07 # 4
Node Classification PubMed (48%/32%/20% fixed splits) LINKX 1:1 Accuracy 87.86 ± 0.77 # 23
Node Classification snap-patents LINKX Accuracy 61.95±0.12 # 4
Node Classification Squirrel LINKX Accuracy 61.81 ± 1.80 # 19
Node Classification on Non-Homophilic (Heterophilic) Graphs Squirrel (48%/32%/20% fixed splits) LINKX 1:1 Accuracy 61.81 ± 1.80 # 8
Node Classification Texas LINKX Accuracy 74.60 ± 8.37 # 45
Node Classification on Non-Homophilic (Heterophilic) Graphs Texas (48%/32%/20% fixed splits) LINKX 1:1 Accuracy 74.60 ± 8.37  # 23
Node Classification on Non-Homophilic (Heterophilic) Graphs twitch-gamers GCNJK 1:1 Accuracy 63.45 ± 0.22 # 17
Node Classification on Non-Homophilic (Heterophilic) Graphs twitch-gamers L Prop 1-hop 1:1 Accuracy 62.77 ± 0.24 # 19
Node Classification on Non-Homophilic (Heterophilic) Graphs twitch-gamers L Prop 2-hop 1:1 Accuracy 63.88 ± 0.24 # 15
Node Classification on Non-Homophilic (Heterophilic) Graphs twitch-gamers LINK  1:1 Accuracy 64.85 ± 0.21 # 13
Node Classification on Non-Homophilic (Heterophilic) Graphs twitch-gamers LINKX 1:1 Accuracy 66.06 ± 0.19 # 6
Node Classification on Non-Homophilic (Heterophilic) Graphs twitch-gamers MLP 1:1 Accuracy 60.92 ± 0.07 # 23
Node Classification twitch-gamers LINKX Accuracy 66.06±0.19 # 2
Node Classification on Non-Homophilic (Heterophilic) Graphs twitch-gamers GATJK 1:1 Accuracy 59.98 ± 2.87 # 24
Node Classification wiki LINKX ACCURACY 59.80±0.41 # 2
Node Classification Wisconsin LINKX Accuracy 75.49 ± 5.72 # 45
Node Classification on Non-Homophilic (Heterophilic) Graphs Wisconsin (48%/32%/20% fixed splits) LINKX 1:1 Accuracy 75.49 ± 5.72 # 23

Methods


No methods listed for this paper. Add relevant methods here