Abstract:
|
In high-throughput sequencing studies, uneven sequencing depth introduces biases to feature profiles, thus obscuring true biological signals of interest and making observed counts incomparable directly between samples. Current normalization methods assuming a constant sample-specific size factor may lead to over or under correction in some cases for microbiome data, since the size factor is usually driven by very high abundant features and extra zeros. To address this, we propose a novel normalization method based on zero-inflated negative binomial regression to model the relationship between mean abundance with sequencing depth. Such a model allows sequencing depth to have differential effects on taxa. Instead of fixed dispersion parameters in traditional methods, we also allow a covariate-dependent dispersion parameter to account for sample heterogeneity. The post-normalized counts reflect the taxa abundance levels that are independent of sequencing depth for downstream analysis. Simulation studies and real data applications show a good performance of our normalization method on eliminating read depth variation and aiding in data interpretation and visualization.
|