Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks

12 Feb 2021  ·  Yujun Yan, Milad Hashemi, Kevin Swersky, Yaoqing Yang, Danai Koutra ·

In node classification tasks, graph convolutional neural networks (GCNs) have demonstrated competitive performance over traditional methods on diverse graph data. However, it is known that the performance of GCNs degrades with increasing number of layers (oversmoothing problem) and recent studies have also shown that GCNs may perform worse in heterophilous graphs, where neighboring nodes tend to belong to different classes (heterophily problem). These two problems are usually viewed as unrelated, and thus are studied independently, often at the graph filter level from a spectral perspective. We are the first to take a unified perspective to jointly explain the oversmoothing and heterophily problems at the node level. Specifically, we profile the nodes via two quantitative metrics: the relative degree of a node (compared to its neighbors) and the node-level heterophily. Our theory shows that the interplay of these two profiling metrics defines three cases of node behaviors, which explain the oversmoothing and heterophily problems jointly and can predict the performance of GCNs. Based on insights from our theory, we show theoretically and empirically the effectiveness of two strategies: structure-based edge correction, which learns corrected edge weights from structural properties (i.e., degrees), and feature-based edge correction, which learns signed edge weights from node features. Compared to other approaches, which tend to handle well either heterophily or oversmoothing, we show that {our model, GGCN}, which incorporates the two strategies performs well in both problems.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Classification Actor GGCN Accuracy 37.54 ± 1.56 # 15
Node Classification Chameleon GGCN Accuracy 71.14 ± 1.84 # 25
Node Classification on Non-Homophilic (Heterophilic) Graphs Chameleon (48%/32%/20% fixed splits) GGCN 1:1 Accuracy 71.14 ±1.84 # 8
Node Classification Citeseer (48%/32%/20% fixed splits) GGCN 1:1 Accuracy 77.14 ± 1.45 # 9
Node Classification Cora (48%/32%/20% fixed splits) GGCN 1:1 Accuracy 87.95 ± 1.05 # 13
Node Classification Cornell GGCN Accuracy 85.68 ± 6.63 # 17
Node Classification on Non-Homophilic (Heterophilic) Graphs Cornell (48%/32%/20% fixed splits) GGCN 1:1 Accuracy 85.68 ± 6.63  # 6
Node Classification on Non-Homophilic (Heterophilic) Graphs Film(48%/32%/20% fixed splits) GPRGCN 1:1 Accuracy 35.16 ± 0.9 # 21
Node Classification on Non-Homophilic (Heterophilic) Graphs Film(48%/32%/20% fixed splits) GGCN 1:1 Accuracy 37.54 ± 1.56  # 6
Node Classification PubMed (48%/32%/20% fixed splits) GGCN 1:1 Accuracy 89.15 ± 0.37 # 15
Node Classification Squirrel GGCN Accuracy 55.17 ± 1.58 # 36
Node Classification on Non-Homophilic (Heterophilic) Graphs Squirrel (48%/32%/20% fixed splits) GGCN 1:1 Accuracy 55.17 ± 1.58 # 15
Node Classification Texas GGCN Accuracy 84.86 ± 4.55 # 31
Node Classification on Non-Homophilic (Heterophilic) Graphs Texas (48%/32%/20% fixed splits) GGCN 1:1 Accuracy 84.86 ± 4.55 # 10
Node Classification Wisconsin GGCN Accuracy 86.86 ± 3.29 # 33
Node Classification on Non-Homophilic (Heterophilic) Graphs Wisconsin (48%/32%/20% fixed splits) GGCN 1:1 Accuracy 86.86 ± 3.29  # 15

Methods