Abstract:
|
Many statistical methods have been proposed for RNA-Seq differential analysis and implemented in open source software such as R/Bioconductor. Unfortunately, the false discovery rate (FDR) of all current methods is much higher than the nominal level (5%) for small sample size RNA-Seq experiments, making further validation of selected significant genes prone to a high probability of false discoveries. We developed an ensemble method for gene differential expression analysis by integrating three established RNA-Seq differential analysis methods (voom, DESeq2, and SAMseq). Meanwhile, we also developed a novel weighting algorithm for ranking top candidates. Our simulation studies showed our ensemble method can well control the FDR within the nominal level, while all other three methods have much higher FDR over the nominal level. The novel weighting algorithm based on kappa statistics ranks the selected significant genes according to computed weighted rank values. The top candidate genes selected by our ensemble method are different from either of those three methods. Our ensemble method also applied to a real GEO RNA-Seq data set related to osteoblast (GSE79814) for gene selection.
|