JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 509
Type: Contributed
Date/Time: Wednesday, August 1, 2012 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract - #306675
Title: Evaluation and Developments of Methods for the Tuning Parameter Choice in Sparse Clustering
Author(s): Wenzhu Bi*+ and George C Tseng and Lisa A Weissfeld
Companies: University of Pittsburgh and University of Pittsburgh and University of Pittsburgh
Address: 130 DeSoto Street, Pittsburgh, PA, 15261, United States
Keywords: clustering ; Lasso ; tuning parameter ; gap statistics ; Bayesian Information Criterion (BIC)
Abstract:

In sparse clustering, i.e. clustering methods with variable selection, the results of the Lasso or other related penalties depend on the tuning/penalization parameter choice. The magnitude of the tuning parameter relates to the number of variables selected for clustering. Pan and Shen in 2007 used a modified Bayesian Information Criterion (BIC) to realize the choice of the number of clusters and the tuning parameter altogether. Witten and Tibshirani in 2010 proposed using the gap statistic to choose the tuning parameter. Both methods rely on the correct specification of the tuning parameter pool. We have observed that given a suboptimal pool the gap statistic method could choose a tuning parameter which yields poor performance when clustering, i.e. a high classification error rate. We propose two methods to improve the tuning parameter choice for the sparse k-means method. One method is to use an adjusted BIC in the likelihood framework and the other is to use cluster validation, which is similar to cross-validation that is commonly used to choose tuning parameter in penalized linear regression or classification. The performance of these methods is evaluated via simulation studies.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program




2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.