Abstract:
|
As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNAseq data are measured as counts. It has been proposed to model RNAseq counts as continuous variables using nonparametric regression to account for their inherent heteroscedasticity. In this vein, we propose varseq, a principled, model-free, and efficient method for detecting changes in RNAseq data. As opposed to methods that use the negative binomial to model the RNAseq counts (such as edgeR or DESeq2), this approach does not rely on any distributional assumptions. We show how these assumptions can lead to inflated type I error and demonstrate the robustness of our approach. Additionally, varseq can easily incorporate the analysis of gene sets or complex longitudinal study designs. We further demonstrate the utility of varseq in the analysis of a vaccine trial for Ebola. Software has been made available in the R package tcgsaseq.
|