Abstract:
|
The explosion of metagenomic sequencing data has led to intense interest in the modeling of microbial community structure, which plays an important role in medicine, agriculture, and ecology. Mixture models have become one of the most common ways to analyze these data, with the underlying mixture components being identified with particular ecological niches. In this work, we expand this framework to a hierarchical Dirichlet process with an underlying latent allocation, allowing for more flexible mixture structure within individual samples. Important for interpretation, we demonstrate an explicit connection between these Bayesian non-parametric priors and certain important models in ecological theory. Our formulation allows for a Gibbs sampling strategy to efficiently fit these models to complex data sets and we provide two examples from the recent literature. We also present a thermodynamic integration approach to determining the optimal number of niches in a data set.
|