Abstract:
|
The problem of deciding on criteria for partitioning a p-dimensional random vector in groups of similar nature is fundamental in variable clustering, a technique widely used in genetics and neuroscience. We advocate the use of probabilistic models for defining a partition G of a p-dimensional vector. For a given, but unknown, partition G we introduce three models, of increasing complexity: G-latent, G-exchangeable and G-block-covariance models. We show that the G-exchangeable and G-block covariance models are identifiable, irrespective of the distribution of X. Moreover, if X is Gaussian, we give mild conditions under which all three models are identifiable. We develop a computationally efficient new method, called Correlation-Fusion, that is shown to recover the unknown partition G with high probability, when data is generated from a Gaussian copula distribution. A minimax lower bound shows that our conditions for recovery are sharp. An extensive simulation study shows that our new method outperforms existing clustering algorithms, when data is generated from a G-model. We illustrate our procedure by estimating regions of interest from an fMRI data set.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.