BayesDB is a probabilistic programming platform that provides built-in non-parametric Bayesian model discovery. BayesDB makes it easy for users to search, clean, and model multivariate databases using an SQL-like language. This talk will illustrate capabilities and limitations of the current open-source prototype, using applications to real-world databases of Earth satellites and psychiatric health surveys. It will also discuss new research opportunities at the intersection of probabilistic programming and computational statistics.
Probabilistic programming is an emerging field based on the insight that probabilistic models and inference algorithms are a new kind of software, and therefore amenable to radical improvements in accessibility, productivity, and scale. Unfortunately, most probabilistic programming systems require users to write probabilistic programs by hand. Instead, BayesDB provides a built-in probabilistic program synthesis system that builds generative models for multivariate databases via inference over programs given a non-parametric Bayesian prior. BayesDB also enables statisticians to override these programs with custom statistical models when appropriate.