Academic Expert Finding via $(k,\mathcal{P})$-Core based Embedding over Heterogeneous Graphs

26 Jul 2021 · Anonymous ·

Finding relevant experts in specified areas is often crucial for a wide range of applications in both academia and industry. Given a user input query and a large amount of academic knowledge (e.g., academic papers), expert finding aims to find and rank the experts who are most relevant to the given query, from the academic knowledge. Existing studies mainly focus on the embedding-based solutions that (1) consider academic papers' textual semantic similarities to a given query through document representation models and (2) extract the top-$n$ experts with the greatest similarities. Beyond the implicit textual semantics of papers, however, the papers’ explicit relationships (e.g., co-authorship, citation, and same-topic relationship) in a heterogeneous academic graph (e.g., DBLP) are critical for document representation, insofar as they help improve the expert finding quality. Despite their importance, the explicit relationships of papers generally have been ignored in the literature. In this paper, we study the academic expert finding on heterogeneous graphs by considering the explicit relationships besides the implicit textual semantics of papers in one representation model. Specifically, we first define the $(k,\mathcal{P})$-core to denote a cohesive community of papers that are closely connected via a meta-path $\mathcal{P}$ ($\mathcal{P}$ indicates the different relationships of papers). We then propose an offline $(k,\mathcal{P})$-core based document embedding model to capture papers' various explicit relationships for representation. Moreover, by using papers' embeddings, we present an online threshold algorithm (TA)-based method to efficiently return top-$n$ experts via a carefully designed proximity graph-based index (PG-Index). We extend our approach to support multiple relationships simultaneously for representation. Extensive experiments over real-world datasets demonstrate the effectiveness and efficiency of our approach.

PDF Abstract