Abstract:
|
Microbiome data obtained from sequencing the 16S rRNA gene consists of a large number of variables (taxa) and usually a small number of samples. Generally, ordination techniques such as principal component analysis and correspondence analysis achieve dimension reduction using the first two or three principal components to explain a large proportion of variation in the data. However, in microbiome data sets, many more than 3 principal components are required to account for a reasonable amount of the variation in the data. In our example data set, the correlation-based PCA requires 10 and 31 components to account for 30% and 70% of the variation, respectively. We introduce novel perspectives on visualizing more than 3 principal components. We present R functions that allow for visualization of taxa, or sample contributions grouped by covariates, on the first several principal components. We also present our R code for processing ade4 (an R ordination package) output for visualization in the Emperor software, part of the QIIME pipeline; this enables interactive visualization that integrates multiple groupings with information from the ordination analysis.
|