Online Program Home
My Program

Abstract Details

Activity Number: 280 - Leading the Stream: Novel Methods for Streaming Data
Type: Invited
Date/Time: Tuesday, July 31, 2018 : 8:30 AM to 10:20 AM
Sponsor: Business and Economic Statistics Section
Abstract #326508 Presentation
Title: Automated Bayesian Inference for Large-Scale Datastreams
Author(s): Trevor Campbell* and Tamara Broderick
Companies: Massachusetts Institute of Technology and Massachusetts Institute of Technology
Keywords: Bayesian; coresets; Hilbert; vector; Frank-Wolfe; inference

The automation of posterior inference in Bayesian data analysis has enabled experts and nonexperts alike to use more sophisticated models, engage in faster exploratory modeling and analysis, and ensure experimental reproducibility. However, standard automated posterior inference algorithms are not tractable at the scale of massive modern datasets, and modifications to make them so are typically model-specific, require expert tuning, and can break theoretical guarantees on inferential quality. This talk will instead take advantage of data redundancy to shrink the dataset itself as a preprocessing step, forming a "Bayesian coreset". The coreset can be used in a standard inference algorithm at significantly reduced cost while maintaining theoretical guarantees on posterior approximation quality. The talk will include an intuitive formulation of Bayesian coreset construction as sparse vector sum approximation, two automated coreset construction algorithms that take advantage of this formulation, theoretical guarantees on posterior approximation quality, and numerous applications to real large-scale data analysis problems.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program