Abstract:
|
Human microbiome studies have revealed an essential role of the human microbiome in health and disease, opening up the possibility of building microbiome-based predictive models. One unique characteristic of microbiome data is the phylogenetic tree that relates all the microbial taxa. It has frequently been observed that a cluster or clusters of taxa are associated with an outcome due to shared biological functions (clustered signal). Depending on the specific condition, a large or a small number of taxa can be involved, representing two distinct biological models (dense and sparse model). We thus develop “glmmTree”, a phylogeny-regularized generalized linear mixed model, for clustered and dense signal, and “SICS”, a phylogeny-regularized sparse generalized linear model, for clustered and sparse signal, respectively. glmmTree uses the global phylogeny-based similarity between microbiomes to predict the outcome while SICS performs variable selection and uses a novel phylogeny-based smoothness penalty to smooth the coefficients of related microbial taxa. Simulation studies and real data applications were used to demonstrate the performance of the proposed method.
|