Keywords: network analysis, unsupervised clustering, exponential-family random graph models, variational-approximation estimation, parametric bootstrap
A framework for unsupervised clustering of the nodes in a network, based on finite mixture models, is described. It can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other model-based clustering work for networks, we introduce a more flexible modeling framework, improve the variational-approximation estimation algorithm, discuss and implement standard error estimation via a parametric bootstrap approach, and apply these methods to larger datasets than those seen elsewhere in the literature. The more flexible modeling framework is achieved through introducing novel parameterizations of the model, giving varying degrees of parsimony and using exponential family models whose structure may be exploited in various theoretical and algorithmic ways.