Abstract:
|
Compositional data refer to the data that lie on a simplex, as the components in a composition must sum to one, traditional statistical tests based on unconstrained data become inappropriate. In this paper, we consider a general problem of testing for the compositional difference between K populations. Motivated by microbiome and metagenomics studies, where the data are often over-dispersed and high-dimensional, we formulate a well-posed hypothesis from a Bayesian point of view and suggest a nonparametric test based on similarity graphs for evaluating the statistical significance. Unlike existing methods, we do not rely on any data transformation or covariance matrix estimation, but directly analyze the compositions. The performance of the proposed test under high dimension is tested by simulated data. We use the new method to reanalyze a real microbiome dataset to study the difference in throat microbiome between smokers and nonsmokers.
|