Abstract:
|
Due to the continuing growth of big data sets, new Bayesian Markov chain Monte Carlo (MCMC) parallel computing methods have been created. These methods divide large data sets by observations into subsets. However, many Bayesian hierarchical models have only a small number of parameters that are common to the full data set, with the majority of parameters being group specific. Therefore, techniques that split the full data set by groups rather than by observations are a more natural analysis approach. Here, we adapt and extend such a two-stage Bayesian hierarchical modelling method. In stage 1, each group is evaluated independently in parallel; the stage 1 posteriors are used as proposal distributions in stage 2, where the full model is estimated. We illustrate our approach using both simulation and real data sets. Our results show considerable increases in MCMC efficiency and large reductions in computation times compared to the full data analysis.
|