Abstract:
|
In many settings an unknown amount of data arrives sequentially in time and due to either the amount of data or the model complexity repeatedly performing batch inference when each observation arrives is infeasible. Bayesian nonparametric (BNP) models are well suited to such streaming data scenarios due to their ability to adapt model complexity with the observed data. Unfortunately, existing inference algorithms are either not applicable to streaming applications or not extensible to general BNP models, limiting their application to streaming data. Though a streaming inference algorithm has been developed for the Dirichlet process, there is growing interest in more flexible BNP models building on the class of normalized random measures (NRMs). We present a streaming variational inference algorithm for a general class of mixture models built from the family of NRMs. Our algorithm is based on assumed density filtering (ADF), which also leads straightforwardly to an expectation propagation (EP) algorithm for large-scale batch inference. We demonstrate the efficacy of the algorithms on clustering documents in large, streaming text corpora.
|