Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 351 - Variable Selection and Computationally Intensive Methods
Type: Contributed
Date/Time: Wednesday, August 5, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Computing
Abstract #312861
Title: A Method for Sparse Monothetic Clustering
Author(s): Paul Harmon* and Mark Greenwood
Companies: Montana State University and Montana State University
Keywords: clustering; data science; lasso; machine learning; high-dimensional data; monothetic clustering
Abstract:

In clustering problems related to high-dimensional data, it is often the case that only a small number of features actually drive meaningful differences in cluster membership while many of the features simply contribute noise. Sparse clustering making use of a Lasso penalty, introduced by Witten and Tibshirani, can be applied to k-means and standard hierarchical clustering procedures. Sparse clustering has been shown to produce more interpretable, accurate clusters than non-sparse implementations. In particular, sparse hierarchical clustering can be achieved by penalizing the distance matrix input to the algorithm. We consider a specific type of hierarchical clustering, monothetic clustering, a divisive method based on recursive partitions of single features at a time. By imposing a Lasso constraint on the distance matrix input to the monothetic clustering, we generate a sparse monothetic clustering. We compare our results on real and simulated datasets to other methods, both sparse and non-sparse.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program