JSM 2014 Home
Online Program Home
My Program

Abstract Details

Activity Number: 551
Type: Topic Contributed
Date/Time: Wednesday, August 6, 2014 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract #312473 View Presentation
Title: Unsupervised Learning: Assessing Cluster Significance Through a Combination of Cross-Validation and Resampling
Author(s): Werner Stuetzle*+
Companies: University of Washington
Keywords: clustering ; resampling ; cross-validation ; single linkage ; mode
Abstract:

The goal of clustering is to detect the presence of distinct groups in a data set and assign group labels to the observations. Nonparametric clustering is based on the premise that the observations may be regarded as a sample from some underlying density in feature space and that groups correspond to modes of this density. We use Generalized Single Linkage (GSL) clustering (Stuetzle and Nugent, JCGS Vol 19, No. 2, 2010, pp. 397--418) as our clustering method. The question then arises whether clusters in the data suggested by GSL indeed correspond to distinct modes of the underlying density or can be attributed to sampling variability. We propose a heuristic based on a combination of cross-validation and resampling to answer this question, and we present the results of Monte Carlo experiments assessing the level and power of our method.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2014 program




2014 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Professional Development program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.