Abstract:
|
Filtering is a commonly used approach to remove rare taxa that are possibly generated by contamination or taxa misclassification. This approach reduces the extreme sparsity of microbiome data, allowing researchers to effectively use well-developed lower dimensional methods. Here, we assess the effect of filtering on the alpha and beta diversity estimation, as well as its impact on identifying taxa that discriminate between disease states. Results of this study show that in microbiome quality control datasets, for samples containing same bacteria processed at different labs, filtering reduces the magnitude of differences in alpha diversity and alleviates technical variability between labs, while preserving between samples similarity (beta diversity). In the disease study datasets where random forest model and LEfSe method are used to identify signal taxa in a classification problem, results show that filtering retains important taxa and preserves the model classification power. Filtering also mitigates the sensitivity of classification methods towards extremely rare taxa. The comparison between filtering and contaminant removal method shows that they have complementary effects.
|