Online Program Home
My Program

Abstract Details

Activity Number: 376
Type: Contributed
Date/Time: Tuesday, August 2, 2016 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #319726
Title: A Stability Analysis of Sparse K-Means
Author(s): Abraham Apfel* and Stewart J. Anderson
Companies: University of Pittsburgh and University of Pittsburgh
Keywords: high-dimensional ; stability ; clustering ; sparse
Abstract:

Sparse K-Means clustering is an established method of simultaneously excluding uninformative features and clustering the observations. This is particularly useful in a high dimensional setting such as micro-array. However the subsets of features selected is often inaccurate when there are overlapping clusters, which adversely affects the clustering results. The current method also tends to be inconsistent, yielding high variability in the number of features selected. We propose to combine a stability analysis with Sparse K-Means via performing Sparse K-Means on subsamples of the original data to yield accurate and consistent feature selection. After reducing the dimensions to an accurate, small subset of features, the standard K-Means clustering procedure is performed to yield accurate clustering results. Our method demonstrates improvement in accuracy and reduction in variability providing consistent feature selection as well as a reduction in the clustering error rate (CER) from the previously established Sparse K-Means clustering methodology. Our method continues to perform well in situations with strong cluster overlap where the previous methods were unsuccessful.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

 
 
Copyright © American Statistical Association