Abstract:
|
Finite mixtures are a flexible building block in the probabilistic modeler's toolkit, widely used in applications such as density estimation and clustering. An important issue arising from standard applications of discrete mixtures is low separation in the components; in particular, different components can be introduced that are very similar and hence redundant. Such a redundancy leads to extraneous clusters that are very similar, degrading performance, harming interpretability, and leading to computational problems and unnecessarily complex models. Redundancy can arise in the absence of a penalty on components placed close together even when a Bayesian approach is used to learn the number of components. To solve this problem, we propose a novel prior that generates components from a repulsive point process, viz. the Matern point process. Our model allows the number of mixture components to be estimated from data, automatically penalizing redundant components. We characterize this repulsive prior theoretically and propose an efficient Markov chain Monte Carlo sampling algorithm for posterior computation. We evaluate this using synthetic and real datasets.
|