D$^2$-GCN: Data-Dependent GCNs for Boosting Both Efficiency and Scalability

29 Sep 2021 · Chaojian Li, Xu Ouyang, Yang Zhao, Haoran You, Yonggan Fu, Yuchen Gu, Haonan Liu, Siyuan Miao, Yingyan Lin ·

Graph Convolutional Networks (GCNs) have gained an increasing attention thanks to their state-of-the-art (SOTA) performance in graph-based learning tasks. However, their sheer number of node features and large adjacency matrix limit their deployment into real-world applications, as they impose the following challenges: (1) prohibitive inference cost, especially for resource-constrained applications and (2) low trainability of deep GCNs. To this end, we aim to develop low-cost GCNs with improved trainability, as inspired by recent findings in deep neural network optimization which show that not all data/(model components) are equally important. Specifically, we propose a Data-Dependent GCN framework dubbed D$^2$-GCN which integrates data-dependent dynamic skipping at multiple granularities: (1) node-wise skipping to bypass aggregating features of unimportant neighbor nodes and their corresponding combinations; (2) edge-wise skipping to prune the unimportant edge connections of each node; and (3) bit-wise skipping to dynamically adapt the bit-precision of both the node features and weights. Our D$^2$-GCN is achieved by identifying the importance of node features via a low-cost indicator, and thus is simple and generally applicable to various graph-based learning tasks. Extensive experiments and ablation studies on 6 GCN model and dataset pairs consistently validate that the proposed D$^2$-GCN can (1) largely squeeze out unnecessary costs from both the aggregation and combination phases (e.g., reduce the inference FLOPs by $\downarrow$1.1$\times$ $\sim$ $\downarrow$37.0$\times$ and shrink the energy cost of GCN inference by $\downarrow$1.6$\times$ $\sim$ $\downarrow$8.4$\times$), while offering a comparable or an even better accuracy (e.g., $\downarrow$ 0.5% $\sim$ $\uparrow$ 5.6%); and (2) help GCNs to go deeper by boosting their trainability (e.g., providing a $\uparrow$ 0.8% $\sim$ $\uparrow$ 5.1% higher accuracy when increasing the model depth from 4 layers to 64 layers) and thus achieving a comparable or even better accuracy of GCNs with more layers over SOTA techniques (e.g., a $\downarrow$0.4% $\sim$ $\uparrow$38.6% higher accuracy for models with 64 layers). All the codes and pretrained models will be released upon acceptance.

PDF Abstract