Abstract:
|
A limitation of commonly used Bayesian approaches for variable selection is scalability: posterior inference replies on MCMC, which is time consuming and can hardly scale with large data. Motivated by a recent work by Rockova and George (2013), we derive an efficient EM algorithm which can retrieve the variable subset which achieves the highest posterior probability (MAP). An appealing feature of our EM algorithm is that we do not need to calculate the inverse of a large matrix in each iteration, which is unavoidable in many other algorithms, due to a computation trick which utilizes a special structure of our algorithm. We further propose an ensemble approach for variable selection based on Bayesian bootstrap: the main idea is to repeatedly apply a stochastic version of our EM algorithm on a subset of the data (i.e., the Bayesian bootstrap samples), and then aggregate variable selection results across those bootstrap experiments. Empirical studies have shown that the bootstrap-EM method is much faster and more accurate than the original EM, especially in the large p small n scenario.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.