Identification of genes with differentially expressed profiles in time-course RNAseq experiments is crucial for understanding the transcriptional regulatory network. Existing pipelines do not properly address the correlations between the multiple RNAseq counts collected on the same subject. Moreover, in their majority they are built on empirical Bayes methodology to stabilize estimation and increase power in designs with small sample sizes. Such approaches are not always well understood by end users and implications in longitudinal designs have not been evaluated. The main issue is the severe underestimation of the dispersion parameters due to limited information. This leads to inflated type I error for the tests applied.
In this work we study alternative approaches to control the type I error of the test statistics by addressing directly the problem. In particular, we consider negative binomial mixed models for the longitudinal modelling of RNAseq data and extend small sample corrections for linear mixed models to count data. We have studied methods that estimate the denominator degrees of freedom from the data for the F statistic and bootstrap based p-values.