Abstract:
|
In the era of big data, statisticians are faced with new challenges: the size of data is getting unexpectedly large as new data cumulates. How to efficiently combine information from the enormously large old data to give online parameter estimation as new data spouts becomes a necessity. Traditionally, parameter estimations are based on a fixed pond of data; in this paper, we introduce Bayesian online learning methodology for a dynamic stream of data, for sake of computational efficiency and storage economy. The proposed sequential batch inference goes as follows: cut the old data into smaller batches and regard the new data as the last batch; the posterior of the previous batch, approximated by a piecewise constant density constructed from posterior samples, serves as a prior for the current batch, whose posterior serves as a prior for the next batch. We only need to run full Bayesian computation for each batch, which is a much smaller data set. Under this framework, we can conduct Bayesian online learning on big data problems. The proposed online learning algorithm gives exact Bayesian inference for a wide range of models, e.g. independent observations and state-space models.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.