Sampling Before Training: Rethinking the Effect of Edges in the Process of Training Graph Neural Networks

29 Sep 2021 · Hengyuan Ma, Qi Yang, Bowen Sun, Long Shun, Junkui Li, Jianfeng Feng ·

Graph neural networks (GNN) demonstrate excellent performance on many graph-based tasks; however, they also impose a heavy computational burden when trained on a large-scale graph. Although various sampling methods have been proposed to speed up training GNN by shrinking the scale of the graph during training, they become unavailable if we need to perform sampling before training. In this paper, we quantify the importance of every edge for training in the graph with the extra information they convey in addition to the node features, as inspired by a manifold learning algorithm called diffusion map. Based on this calculation, we propose Graph Diffusion Sampling (GDS), a simple but effective sampling method for shrinking the size of the edge set before training. GDS prefers to sample edges with high importance, and edges dropped by GDS will never be used in the training procedure. We empirically show that GDS preserves the edges crucial for training in a variety of models (GCN, GraphSAGE, GAT, and JKNet). Compared to training on the full graph, GDS can guarantee the performance of the model while only samples a small fraction of the edges.

PDF Abstract