Abstract:
|
The continued increase in accessibility to RNA sequencing (RNA-Seq) technology has led to more complicated study designs that demand analysis methods beyond the scope of what current methods were designed to handle. The most popular analysis tools for RNA-Seq data, edgeR and DESeq2, are designed for use on studies that only include fixed effects, and thus do not account for the correlation between repeated/clustered measurements or other random effects. In this work, we propose using a Bayesian hierarchical negative binomial model for analyzing RNA-Seq data that will naturally allow for the inclusion of random effects. Model parameters are estimated using MCMC methodologies with a Weighted Least Squares proposal distribution for better mixing of regression parameters (available in the MCMSeq R package). We compare the MCMC results to edgeR, DESeq2, and traditional generalized linear mixed models in terms of power, type I error rates, FDR, and MSE of regression coefficients on simulated data. Results show that the Bayesian model better controls type I errors at the small alpha levels needed to effectively adjust for multiple comparisons (e.g. 0.001), and also better controls FDR.
|