Abstract:
|
Metagenomic sequencing allows researchers to gather abundance data rapidly and inexpensively on a nearly complete set of microbial taxa within environmental samples. How to model the structure of the ecological dynamics these data reveal is an important applied statistical problem with consequences for environmental science and medicine. The Dirichlet-multinomial mixture model (DMM) has become the gold-standard for analyzing these datasets, providing estimates of the underlying clusters giving rise to the data. However, this approach makes strong assumptions about the functional equivalence of taxa within the ecology that are often violated in practice. I show how this issue can be resolved by introducing latent factors that combine to give a Dirichlet-multinomial likelihood. Taking a Bayesian approach, I provide a reversible jump implementation that efficiently infers the latent factors. Applied to two metagenomic datasets from and a classical plankton dataset, I show that the latent factor model gives improved interpretability over the DMM and conclude with possible computational refinements.
|