Non-local Attention Learning on Large Heterogeneous Information Networks

Heterogeneous information network (HIN) summarizes rich structural information in real-world datasets and plays an important role in many big data applications. Recently, graph neural networks have been extended to the representation learning of HIN. One very recent advancement is the hierarchical attention mechanism which incorporates both nodewise and semantic-wise attention. However, since HIN is more likely to be densely connected given its diverse types of edges, repeatedly applying graph convolutional layers can make the node embeddings indistinguishable very quickly. In order to avoid oversmoothness, existing graph neural networks targeting HIN generally suffer from a shallow structure. Consequently, those approaches ignore information beyond the local neighborhood. This design flaw violates the concept of non-local learning, which emphasizes the importance of capturing long-range dependencies. To properly address this limitation, we propose a novel framework of non-local attention in heterogeneous information networks (NLAH). Our framework utilizes a non-local attention structure to complement the hierarchical attention mechanism. In this way, it leverages both local and non-local information simultaneously. Moreover, a weighted sampling schema is designed for NLAH to reduce the computation cost for largescale datasets. Extensive experiments on three different realworld heterogeneous information networks illustrate that our framework exhibits extraordinary scalability and outperforms state-of-the-art baselines with significant margins.

PDF Abstract


Results from the Paper

 Ranked #1 on Heterogeneous Node Classification on DBLP (PACT) 14k (Macro-F1 (60% training data) metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Heterogeneous Node Classification DBLP (PACT) 14k NLAH (2ndprox) Macro-F1 (60% training data) 96.48% # 1
Heterogeneous Node Classification DBLP (PACT) 14k NLAH (ppmi) Macro-F1 (60% training data) 95.91% # 3
Heterogeneous Node Classification DBLP (PACT) 14k NLAH (ppr) Macro-F1 (60% training data) 95.95% # 2


No methods listed for this paper. Add relevant methods here