Abstract:
|
A common step in the analysis of microbiome data is to visualize the samples using multi-dimensional scaling in combination with a phylogenetic tree-based distance such as weighted Unifrac. However, this method weights the deep branches of the tree very heavily, to the extent that true tree-related variation present in the data is often suppressed in a weighted Unifrac ordination. I give a mathematical explanation for this overweighting of the deep branches and present a new method that automatically decides how much weight to give to the deep branches compared to shallower branches. This method is based on a new interpretation of generalized PCA and can be used to make any kind of structured ordination (e.g. pathway-based ordination of gene expression or metabolite levels) in addition to phylogenetically structured ordinations. I illustrate the method on both simulated data and a real microbiome data set.
|