Abstract:
|
Dirichlet process mixture models (DPMM) provide a flexible approach for clustering and inference, but they are limited in their ability to integrate external information, such as that encoded in a graph explaining some structural features of a data set. Such graphs may structure various forms of genetic data, such as graphs encoding metabolic pathways. In this work, we propose Graph-DPMM, a mixture model for graph-associated data, wherein latent blocks corresponds to connected subgraphs. A central example is the Dirichlet process conditioned to respect the input graph. We investigate both the computational and statistical efficiency gains, and present an MCMC scheme for posterior inference that takes advantage of spanning trees within the input graph.
|