Abstract:
|
We consider the two-class clustering problem, where we have measurements of a large number of features but only a small fraction of them contribute to the power of clustering. In the two-dimensional phase space calibrating the rarity of the useful features and their strengths, we find the precise demarcation for the Region of Impossibility and Region of Possibility. In the former, the useful features are too rare/weak to allow successful clustering. In the latter, the useful features are strong enough and successful clustering is possible. We propose both classical PCA and Important Features PCA (IF-PCA) for clustering. For a threshold t > 0, IF-PCA first removes all columns of X whose L2-norm falls below t, and then performs clustering using the classical PCA. We also propose two aggregation methods for clustering. We show that, for any parameter in the Region of Possibility, one or more of these four methods yield successful clustering. We also extend the study to two closely related problems: the signal recovery problem and the hypothesis testing problem. We compare the fundamental limits for all three problems and expose some interesting insight.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.