Abstract:
|
In the analysis of RNA-Seq data, detecting differentially expressed (DE) genes has been a hot research area in recent years and many methods have been proposed. DE genes show different average expression levels in different sample groups, and thus can be important biological markers. While generally very successful, these methods need to be further tailored and improved for cancerous data. This data often features quite diverse expression in the cancer group-with some samples appearing as huge outliers-and this diversity is often much greater than that in the control group. We propose a statistical method that can detect not only genes that show different average expression, but also genes that show different diversities of expression in different groups. These "differentially dispersed" genes can be important clinical markers. Our method uses a redescending penalty on the quasi-likelihood function, and thus has superior robustness against outliers and other noise. Simulations and real data analysis demonstrate that DiPhiSeq outperforms existing methods in the presence of outliers, and identifies unique sets of genes. DiPhiSeq is available as a CRAN R package.
|