Abstract:
|
The Bayesian paradigm is attractive in big data settings such as text and image processing because it allows for the construction of rich and understandable models, along with propagation of inferential uncertainty to predictive uncertainty. However, most forms of modern Bayesian computation are unable to scale to these problems. To make Bayesian inference in these settings tractable, researchers typically either alter the models being fitted so that they lead to scalable algorithms, or develop novel algorithms with favorable complexity-theoretic properties in the standard setting. In this talk, we present a third approach: adapting Bayesian algorithms to the novel computational environments that big data is typically found in. We review and compare the characteristics of these environments, showing that different types of parallelism have different requirements and performance consequences. We demonstrate that in spite of the fact that many Bayesian methods such as Markov Chain Monte Carlo algorithms are inherently iterative, through the use of techniques such as Asynchronous Gibbs Sampling, they can successfully be adapted to massively parallel architectures.
|