Abstract:
|
In clustering problems related to high-dimensional data, it is often the case that only a small number of features actually drive meaningful differences in cluster membership while many of the features simply contribute noise. Sparse clustering making use of a Lasso penalty, introduced by Witten and Tibshirani, can be applied to k-means and standard hierarchical clustering procedures. Sparse clustering has been shown to produce more interpretable, accurate clusters than non-sparse implementations. In particular, sparse hierarchical clustering can be achieved by penalizing the distance matrix input to the algorithm. We consider a specific type of hierarchical clustering, monothetic clustering, a divisive method based on recursive partitions of single features at a time. By imposing a Lasso constraint on the distance matrix input to the monothetic clustering, we generate a sparse monothetic clustering. We compare our results on real and simulated datasets to other methods, both sparse and non-sparse.
|