Abstract:
|
Recent years have seen the development of a variety of statistical models appropriate for analysis of microbial communities at population scale, particularly for sparse (zero-inflated) compositional or count-based measurements. We developed an optimal combination of methodology to assess multivariable association of microbiome features with complex metadata in population-scale observational studies. We conducted a large-scale evaluation of a broad range of scenarios, identifying a combination of generalized linear and hierarchical models to detect or simulate microbial community feature associations with environmental or human health phenotypes. The models capture characteristics unique to microbiome data, including sparsity, joint effects of biological and sequencing variation, and ecological feature dependencies, and are capable of simulating mock microbial counts that recapitulate training communities. Finally, they have been applied to a microbial multi-omics dataset from the Integrative Human Microbiome Project (HMP2) which, in addition to reproducing established results, revealed an integrated landscape of inflammatory bowel disease (IBD) across time points and 'omics.
|