D$^2$-GCN: Data-Dependent GCNs for Boosting Both Efficiency and Scalability

29 Sep 2021  ·  Chaojian Li, Xu Ouyang, Yang Zhao, Haoran You, Yonggan Fu, Yuchen Gu, Haonan Liu, Siyuan Miao, Yingyan Lin ·

Graph Convolutional Networks (GCNs) have gained an increasing attention thanks to their state-of-the-art (SOTA) performance in graph-based learning tasks. However, their sheer number of node features and large adjacency matrix limit their deployment into real-world applications, as they impose the following challenges: (1) prohibitive inference cost, especially for resource-constrained applications and (2) low trainability of deep GCNs. To this end, we aim to develop low-cost GCNs with improved trainability, as inspired by recent findings in deep neural network optimization which show that not all data/(model components) are equally important. Specifically, we propose a Data-Dependent GCN framework dubbed D$^2$-GCN which integrates data-dependent dynamic skipping at multiple granularities: (1) node-wise skipping to bypass aggregating features of unimportant neighbor nodes and their corresponding combinations; (2) edge-wise skipping to prune the unimportant edge connections of each node; and (3) bit-wise skipping to dynamically adapt the bit-precision of both the node features and weights. Our D$^2$-GCN is achieved by identifying the importance of node features via a low-cost indicator, and thus is simple and generally applicable to various graph-based learning tasks. Extensive experiments and ablation studies on 6 GCN model and dataset pairs consistently validate that the proposed D$^2$-GCN can (1) largely squeeze out unnecessary costs from both the aggregation and combination phases (e.g., reduce the inference FLOPs by $\downarrow$1.1$\times$ $\sim$ $\downarrow$37.0$\times$ and shrink the energy cost of GCN inference by $\downarrow$1.6$\times$ $\sim$ $\downarrow$8.4$\times$), while offering a comparable or an even better accuracy (e.g., $\downarrow$ 0.5% $\sim$ $\uparrow$ 5.6%); and (2) help GCNs to go deeper by boosting their trainability (e.g., providing a $\uparrow$ 0.8% $\sim$ $\uparrow$ 5.1% higher accuracy when increasing the model depth from 4 layers to 64 layers) and thus achieving a comparable or even better accuracy of GCNs with more layers over SOTA techniques (e.g., a $\downarrow$0.4% $\sim$ $\uparrow$38.6% higher accuracy for models with 64 layers). All the codes and pretrained models will be released upon acceptance.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods