604 – Novel Statistical Methods for RNA-Seq Data Analysis
Empirical Bayesian Analyses of High-Throughput Sequencing Data
Thomas James Hardcastle
University of Cambridge
Methods for the analysis of high-throughput sequencing data must exploit the 'large p' nature of the data if they are to overcome the small sample sizes that are commonly available. This paper presents a flexible and powerful methodology for analysis of high-throughput sequencing data based on an empirical Bayesian approach. The methods are demonstrated on two problems in high-throughput sequencing, that of differential expression discovery and of locus detection based on genome-aligned reads. For the application of differential expression, we show that the methods perform at least as well as any alternative approach. In the application to locus discovery, we show how, beginning with an initially poor approximation to the loci, we can use this empirical Bayesian approach to bootstrap to a much improved definition of the loci. The methods developed here form a general strategy for the analysis of high-throughput sequencing data and may in principle be used with any set of models and distributions for the data. Novel modifications to the basic approach that reduce the computational effort required and increase the performance of these methods are introduced.