Abstract:
|
Mega-analysis by integrating batches of data boosts the power to detect associations between microbiome data and clinical variables. However, microbiome data can suffer from batch effects, leading to excessive false positives and false negatives. Most of the existing microbiome batch adjustment strategies rely on approaches originally designed for genomic analysis. Many of them assume Gaussian linear or negative binomial regression models, failing to adequately address the zero-inflation, dispersion and heterogeneity issues in microbiome data. The other strategies tailored for microbiome data can only be used for association testing, failing to allow other analytic goals such as visualization. We developed a batch correction method, ConQuR, which uses a two-part quantile regression model to consider both inflated zeros and complex distributional attributes of the non-zero measures. It preserves the zero-inflated integer nature of microbiome data, which is compatible with any subsequent normalization and analysis. We applied it to several real data and showed that it outperforms the existing methods in removing batch effects and boosting the power to detect associations.
|