Abstract:
|
The study of microbiomes has become a topic of intense interest in last several decades as the development of new sequencing technologies has made DNA data accessible across disciplines. In this paper, we analyze a global dataset to investigate environmental factors that affect topsoil microbiome. As yet, much associated work has focused on linking indicators of microbial health to specific outcomes in various fields, rather than understanding how external factors may influence the microbiome composition itself. This is partially due to limited statistical methods to model abundance counts. The counts are high-dimensional, overdispersed, often zero-inflated, and exhibit complex dependence structures. Additionally, the raw counts are often noisy and compositional, and thus are not directly comparable across samples. Often, practitioners transform the counts to presence-absence indicators, but this transformation discards much of the data. As an alternative, we propose transforming to taxa ranks and develop a Bayesian variable selection model that uses ranks to identify covariates that influence microbiome composition. We show by simulation that the proposed model outperforms competi
|