Abstract:
|
Data from a 16S rRNA microbiome study is typically a nxp table of counts (n=sample size, p=number of taxa). By calculating a matrix of pairwise distances between samples, followed by ordination (plotting samples in 2 or 3 dimension using principal components), striking differences between meaningful groups (e.g., cases vs controls) are often seen. However, methods for testing hypotheses about the observed patterns or finding the taxa that contribute most are not well developed. We show a new distance-based linear decomposition model that assesses the importance of explanatory variables (e.g, case/control status) and taxa by the proportion of variability in the data that they explain. Any distance matrix can be used. The variance explained is easily calculated, so that its significance can be established using permutation. We also consider the effect of data transformation on the power of our tests. Using simulation, we show that our approach can have more power to detect overall association than PERMANOVA and MiRKAT; further, it identifies more taxa than DESeq and these taxa tend to be more important (as measured by the mean difference in frequencies between groups).
|