|
Activity Number:
|
340
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Tuesday, July 31, 2007 : 2:00 PM to 3:50 PM
|
|
Sponsor:
|
Section on Statistical Computing
|
| Abstract - #309601 |
|
Title:
|
Presence versus Significance: How Significant Are Your Clusters?
|
|
Author(s):
|
Rebecca Nugent*+ and Werner Stuetzle
|
|
Companies:
|
Carnegie Mellon University and University of Washington
|
|
Address:
|
Department of Statistics Baker Hall 132, Pittsburgh, PA, 15217,
|
|
Keywords:
|
clustering ; confidence ; bootstrap ; level sets ; generalized single linkage
|
|
Abstract:
|
The goal of clustering is to identify distinct groups in a dataset and assign a group label to each observation. To cast clustering as a statistical problem, we regard the data as a sample from an unknown population p(x). However, clustering methods rarely generate a one-to-one mapping of clusters to groups in the population. Groups may be partitioned into several clusters; spurious clusters may be falsely identified. A cluster's presence does not imply its significance. We introduce a bootstrap-based simultaneous confidence band used to estimate the hierarchical cluster structure of p(x) by the cluster tree of its level sets. Clustering with Confidence assigns a significance level to the cluster tree (and the individual clusters). Results for a graph-based estimation approach, generalized single linkage, will be shown.
|