Abstract:
|
Analysis of sequencing-enabled microbiome data poses many challenges that result from the combination of small sample sizes, low resolution, sparsity and high-dimensional measurements. Further, resolution and sparseness make it difficult for the data to be analyzed under the assumption of log-normality that is commonly made when analyzing other sequencing-based omics data. Fortunately, measurements of microbial communities exhibit rich correlation structure that can be leveraged within a dimensionality reduction paradigm to uncover associations with biological processes related to health and disease. To this end, we present a discriminative factor model based on non-negative matrix factorization that addresses the challenges of microbiome data, as well as the need for identifying microbial communities associated to phenotypes of interest. Experiments on artificial and real-world microbiome datasets illustrate the capabilities of the proposed approach in relation to existing methods.
|