Abstract:
|
The automation of posterior inference in Bayesian data analysis has enabled experts and nonexperts alike to use more sophisticated models, engage in faster exploratory modeling and analysis, and ensure experimental reproducibility. However, standard automated posterior inference algorithms are not tractable at the scale of massive modern datasets, and modifications to make them so are typically model-specific, require expert tuning, and can break theoretical guarantees on inferential quality. This talk will instead take advantage of data redundancy to shrink the dataset itself as a preprocessing step, forming a "Bayesian coreset". The coreset can be used in a standard inference algorithm at significantly reduced cost while maintaining theoretical guarantees on posterior approximation quality. The talk will include an intuitive formulation of Bayesian coreset construction as sparse vector sum approximation, two automated coreset construction algorithms that take advantage of this formulation, theoretical guarantees on posterior approximation quality, and numerous applications to real large-scale data analysis problems.
|