Abstract:
|
Biologic significance of identifying groups of genes with similar expression patterns, using DNA microarrays, has been demonstrated in numerous studies. Since the inception of the microarray technology, virtually all traditional clustering approaches, and some new ones, were applied in this context. Most of these approaches did not offer a credible assessment of uncertainties about generated clusters. We used the Dirichlet process normal mixture model to cluster gene expression profiles. In this approach, similar individual profiles are assumed to have been generated by the common underlying "pattern" represented by a multivariate normal distribution. The stochastic data-generation process is described in terms of a Bayesian hierarchical model and groups of genes with similar expression patterns are identified by examining the posterior distribution of clusterings that is estimated by a Gibbs sampler. In this talk we describe the methodology and demonstrate the practical importance of conceptually beneficial properties of this approach, such as averaging over models with different numbers of mixture components and the precise treatment of the experimental variability.
|