Abstract:
|
Microbiome data allow biologists to use the composition of microbes in an environment to predict phenotypes of interest. Much statistical work has focused on two technical challenges for the analysis of microbiome data: (i) it is high-dimensional, i.e., there are a large number of microbes and (ii) it is not meaningful to directly compare the raw absolute abundances measured, which means compositional data methods are often used. The focus of this work is on yet another major challenge, which has received far less attention than the other two: microbiome data has a high degree of sparsity, i.e., the vast majority of microbes measured are generally present in only a few samples. When a microbe is rarely observed it becomes challenging for a statistical procedure to select it as a feature in a regression model. For this reason, it is common for scientists to discard rarely observed microbes before any analysis. In this talk, we develop a method that incorporates another view of microbiome data, namely taxonomic information, to overcome all three of these challenges. It allows rare microbes to contribute to predictions while also providing easily interpretable models.
|