Online Program Home
My Program

Abstract Details

Activity Number: 528 - Analysis of Big Data
Type: Contributed
Date/Time: Wednesday, August 1, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #330389 Presentation
Title: High-Dimensional Regression for Microbiome Compositional Data
Author(s): Xiaohan Yan* and Jacob Bien and Christian Mueller
Companies: Cornell University and University of Southern California and Flatiron Institute
Keywords: microbiome data; compositional data; high-dimensional statistics; OTU

Microbiome data record the relative abundances of microbial entities, called operational taxonomic units (OTUs), that are present in various environments. Analyzing such data is challenging for three main reasons: (i) they are compositional, i.e., the absolute abundances are not known; (ii) they are high-dimensional, i.e., there are a large number of microbes; and (iii) there is a high level of sparsity, i.e., microbes are generally not present in very many samples. Many existing methods carefully address the first two challenges, but then take an ad hoc approach to address the challenge of data sparsity. For example, authors sometimes manually aggregate OTUs to the genus or family level and/or simply filter out any microbial entities that are rare. We propose instead a principled regression framework that addresses all three challenges. In particular, our method makes use of phylogenetic information to automate the aggregation process in a data-driven manner. We show that our approach leads to superior performance relative to pre-existing methods on microbiome data.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program