Abstract:
|
High-throughput sequencing technologies such as RNA-sequencing and microarrays allow researchers to measure a single sample's expression of tens of thousands of genes. Hierarchical models which model the distribution of gene specific parameters allows for data-dependent sharing of information across genes. Parametric hierarchical models provide regularization of parameter estimates, but may be sensitive to model assumptions. To relax assumptions on the hierarchical distributions, we propose a semiparametric model that assumes a Dirichlet Process prior on the distribution of gene specific parameters in order to automatically learn the underlying distribution of those. To make a fully Bayesian approach computationally tractable, we develop a parallelized Markov chain Monte Carlo algorithm which exploits general purpose graphics processing unit through the use of embarrassingly parallel computations and parallel reductions.
|