Abstract:
|
In identifying subpopulations within data, true group structure can be masked by extraneous variables, thus motivating the need for a variable selection procedure to identify important variables for model-based clustering. Currently in the clustering literature, empirical Bayes methods tackle the simultaneous model-based clustering and variable selection problem. These approaches have limitations, primarily in the assumption that a single locally optimal solution exists. We propose a fully Bayesian approach, in which a set of globally optimal solutions are found using the reversible-jump Markov chain Monte Carlo algorithm. Our method permits modeling of the full likelihood in which the proportion of cluster membership, mean, and covariance parameters of each component are estimated. We also incorporate variance constraint selection with covariance constraints from Banfield and Raftery [1993]. Our method allows dimension changing in the variables, variance constraints, and group subspaces, resulting in a complete representation of the clustering model with simultaneous variable selection.
|