Abstract:
|
Divide-and-conquer is a natural computational paradigm for approaching large-scale data analysis problems, particularly given recent developments in distributed and parallel computing. Interesting challenges arise, however, when applying the divide-and-conquer paradigm to statistical inference problems. For example, both frequentist and Bayesian interval estimates classically scale with the square root of the number of points in a sample, and subsampling paradigms must face the problem that naive interval estimates are thus on the wrong scale. In the frequentist setting I discuss how this problem can be addressed by the "bag of little bootstraps," a resampling-based procedure in which the bootstrap is used on multiple subsamples, and in the Bayesian setting I discuss "variational consensus Monte Carlo", a distributed Monte Carlo procedure in which multiple samples are combined within a variational optimization framework.
|