Abstract:
|
The analysis of microbiome data is often based on dimension-reduced graphical displays and clustering derived from vectors of microbial abundances measured in each sample. Principal coordinate analysis (or multidimensional scaling), in particular, is often performed using a phylogenetically defined distance to incorporate context-dependent, non Euclidean structure. Here we describe how to take a step beyond ordination plots and incorporate this structure into a penalized regression model. Of interest is the modeling of high-dimensional microbial abundance profiles with a subject's phenotype or clinical condition. We propose a framework for the estimation of a regression coefficient vector that is obtained via the joint eigenproperties of various similarity matrices (or kernels). The approach also allows one to incorporate the appropriate geometry of compositional data (relative abundances) into the structure of penalized regression. Finally, recent inferential methods for this framework provide a means for assigning significance to each individual taxon for its association with the outcome.
|