Improved Communication Cost in Distributed PageRank Computation – A Theoretical Study
PageRank is a widely used approach for measuring the importance of a node in a graph. Computing PageRank is a fundamental task in numerous applications including web search, machine learning and recommendation systems. The importance of computing PageRanks in a distributed environment has been recognized due to the rapid growth of the graph size in real world. However, only a few previous works can provide a provable complexity and accuracy for distributed PageRank computation. Given a constant $d>0$ and a graph of $n$ nodes and under the well-known congested-clique distributed model, the state-of-the-art approach, Radar-Push, uses $O(\log\log{n}+\log{d})$ communication rounds to approximate the PageRanks within a relative error $O(\frac{1}{\log^d{n}})$. However, Radar-Push entails as large as $O(\log^{2d+3}{n})$ bits of bandwidth (e.g., the communication cost between a pair of nodes per round) in the worst case. In this paper, we provide a new algorithm that uses asymptotically the same communication rounds while significantly improves the bandwidth from $O(\log^{2d+3}{n})$ bits to $O(d\log^3{n})$ bits. To the best of our knowledge, our distributed PageRank algorithm is the first to achieve $o(d\log{n})$ communication rounds with $O(d\log^3{n})$ bits of bandwidth in approximating PageRanks with relative error $O(\frac{1}{\log^d{n}})$ under the congested-clique model.
PDF ICML 2020 PDF