Abstract:
|
Elucidating microbial interactions are key for understanding the ecological laws governing microbial communities. High-throughput sequencing promises new opportunities to observe interactions across thousands of uncultured, unknown microbes. However, microbiome datasets are high dimensional and accurate estimation of microbial correlations requires thousands of samples, exceeding the computational capabilities of existing methodologies. Furthermore, sequencing count data is compositional which confounds microbial correlation inference. The Multinomial Logistic Normal (MLN) distribution has been shown to be effective at inferring microbial correlations, but scalable estimation remains challenging. We show that Variational Autoencoders (VAEs) augmented with the ILR transform can estimate MLN distributions thousands of times faster than existing methods. These VAEs can be trained on thousands of samples, enabling co-occurrence inference across thousands of microbes. These VAEs are competitive with existing beta-diversity methods across a variety of mouse and human microbiome classification tasks, with improvements on longitudinal studies.
|