Abstract:
|
In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across different experimental conditions, despite technical and biological variability in the observations. A fundamental challenge is to accurately estimate the effect size, often in terms of a logarithmic fold change (LFC) across conditions. When the counts of sequenced reads are small in either or both conditions, the estimated LFC has high variance, leading to some high estimated LFCs, which do not represent true differences in expression. Current methods introduce arbitrary filtering thresholds and pseudocounts to exclude or moderate the estimated LFC from genes that have small read counts. These method may result in loss of genes from the analysis with true differences across conditions. Here, we propose an empirical Bayes procedure with a wide-tailed prior on effect sizes, which avoids defining arbitrary filter thresholds or pseudocounts. We show that our new estimator for LFC is efficient to calculate and has lower bias than previously proposed shrinkage estimators, while still reducing variance for those genes with little statistical information.
|