Online Program

Return to main conference page
Thursday, May 30
Machine Learning
Deciphering Biological Systems via Innovative Statistical Learning Methods
Thu, May 30, 10:30 AM - 12:05 PM
Grand Ballroom I

Modeling Bias in Compositional Data (305057)

Amy Willis, University of Washington 
*David Clausen, University of Washington 

Keywords: microbiome, statistical learning, machine learning, batch effects, sequencing

The composition of a microbiome is an important parameter to estimate given the critical role that microbiomes play in human and environmental health. However, profiling the composition of a microbial community using high throughput sequencing methods distorts the true composition of the community. Sequencing mock communities -- artificially constructed microbiomes of known composition -- clearly illustrates that observed composition is a biased estimate of true composition, with certain taxa consistently overobserved or underobserved compared to their true relative abundance. We propose a statistical learning model for bias in compositional data, illustrating its performance on data from the Vaginal Microbiome Consortium. We show how our model can be used to correct for batch-specific biases, permitting meta-analysis of microbiome studies.