Abstract:
|
We introduce HITMIX, a new technique for network seed-set expansion, i.e., the problem of identifying a set of graph vertices related to a given seed-set of vertices. We use the moments of the graph's hitting-time distribution to quantify the relationship of each non-seed vertex to the seed-set. This involves a deterministic calculation for the hitting-time moments that is scalable in the number of graph edges and so avoids directly sampling a Markov chain over the graph. The moments are used to fit a mixture model to estimate the probability that each non-seed vertex should be grouped with the seed set. This membership probability enables us to sort the non-seeds and threshold in a statistically-justified way. To the best of our knowledge, HITMIX is the first full statistical model for seed-set expansion that can give vertex-level membership probabilities. While HITMIX is a global method, its linear computation complexity in practice enables computations on large graphs. We have a high-performance implementation, and we present computational results on stochastic blockmodels and a small-world network from the SNAP repository.
|