We provide a new graph generator, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
While data mining in chemoinformatics studied graph data with dozens of nodes, systems biology and the Internet are now generating graph data with thousands and millions of nodes.
Our goal in this paper is to obtain a representative (unbiased) sample of Facebook users by crawling its social graph.
In this paper, we propose to use the concept of shortest path for sampling social networks.
In this paper, we propose non-backtracking random walk with re-weighting (NBRW-rw) and MH algorithm with delayed acceptance (MHDA) which are theoretically guaranteed to achieve, at almost no additional cost, not only unbiased graph sampling but also higher efficiency (smaller asymptotic variance of the resulting unbiased estimators) than the SRW-rw and the MH algorithm, respectively.
Methodology Data Structures and Algorithms Networking and Internet Architecture Social and Information Networks Data Analysis, Statistics and Probability Physics and Society
Thus graph sampling is essential. The natural questions to ask are (a) which sampling method to use, (b) how small can the sample size be, and (c) how to scale up the measurements of the sample (e. g., the diameter), to get estimates for the large graph.
Studying real-world networks such as social networks or web networks is a challenge.
Many communication and social networks have power-law link distributions, containing a few nodes which have a very high degree and many with low degree.
Most studies of networks have only looked at small subsets of the true network.
In this paper, we develop methods to “sample” a small realistic graph from a large real network.