Abstract:
|
Normalization of high-throughput RNA sequencing (RNA-seq) data and statistical tests are essential steps for identifying differentially expressed genes (DEGs). The most commonly used normalization methods are TMM (Trimmed-Mean M-values), RLE (Relative Log Estimate) and UQ (Upper quartile) normalization. The common statistical tests are a Wald test from DESeq2 and an exact test from edgeR. Although several comparative studies reported that DESeq is more conservative than edgeR, both failed to maintain a false discovery rate below a nominal level of 0.05. Recently, we observed that a UQ-pgQ2 normalization combined with an exact test from edgeR has a better specificity for DEG analysis using benchmark MAQC data and simulated data for small sample sizes/replicates. However, for a larger sample size, it remains uncertain if an exact test performs better than a Wald test? To address this question, we evaluated the performance of these methods combined with two tests. We observed that a Wald test perform better than an exact test in controlling for false positives when sample sizes are large. However, an exact test combined with UQ-pgQ2 is best choice when sample sizes are small.
|