Random walk fits naturally with this problem because, for most online social networks, the only query we can issue through the interface is to retrieve the neighbors of a given node (i. e., no access to the full graph topology).
In this paper, we first present a comprehensive analysis of the drawbacks of three widely-used random walk based graph sampling algorithms, called re-weighted random walk (RW) algorithm, Metropolis-Hastings random walk (MH) algorithm and maximum-degree random walk (MD) algorithm.
We show that the proposed sampling method, which we call Frontier sampling, exhibits all of the nice sampling properties of a regular random walk.
Our goal in this paper is to obtain a representative (unbiased) sample of Facebook users by crawling its social graph.
While data mining in chemoinformatics studied graph data with dozens of nodes, systems biology and the Internet are now generating graph data with thousands and millions of nodes.
We provide a new graph generator, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
Random walk is widely applied to sample large-scale graphs due to its simplicity of implementation and solid theoretical foundations of bias analysis.