In the presence of auxiliary nodal characteristics besides linkage information, the additional information often helps to differentiate nodes of interests and background nodes.
We generalize the stochastic blockmodel to propose novel community detection methods that is able to simultaneously determine whether nodes belong to background set using node covariates, as well as cluster nodes into inherently cohesive groups.
A variant of EM algorithm is modified to maximize a joint pseudo-likelihood assuming latent group membership and Poisson or multinomial distributed linkage numbers within and between groups.
Asymptotic consistency of label assignments are proven. Superior performance and robustness in finite samples are observed in simulation studies. The proposed method identifies previously missed gene sets underlying autism and related neurological diseases using de novo mutations, gene expression and protein interaction data.
|