Abstract:
|
Targeted amplicon sequencing data, including 16S rRNA and ITS sequence data, are inherently compositional in nature. Using these data for regression tasks is thus challenging due to the constant sum constraint. In addition, typical microbiome data are overdispersed and zero-inflated. To alleviate the challenges associated with these data, we present novel concomitant regression models for microbiome data where both the regression vector and scales are estimated concomitantly. The presented model estimation tasks admit convex optimization formulations that can be solved efficiently using proximal algorithms. We show improved prediction performance compared to state-of-the-art methods both on synthetic and real microbiome data, ranging from host-associated to environmental amplicon data.
|