|
Activity Number:
|
383
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Wednesday, August 9, 2006 : 8:30 AM to 10:20 AM
|
|
Sponsor:
|
Section on Statistical Computing
|
| Abstract - #307498 |
|
Title:
|
A Scale-Independent Clustering Method with Automatic Variable Selection Based on Trees
|
|
Author(s):
|
Samuel Buttrey*+
|
|
Companies:
|
Naval Postgraduate School
|
|
Address:
|
Code OR Sb, Monterey, CA, 93950,
|
|
Keywords:
|
cluster quality ; classification and regression trees ; prediction strength
|
|
Abstract:
|
Clustering techniques usually rely on measurements of distances (or dissimilarities) among observations and clusters. These distances are often affected by variables' scaling or transformation, and do not provide for selection of "important" variables. We fit a set of regression or classification trees; each variable acts in turn as the response variable. Points are "close" to one another if they tend to appear in the same leaves of these trees. Trees with poor predictive power are discarded. "Noise" variables which appear in none of the trees have no effect on the clustering and can be ignored. The clustering is unaffected by linear transformations of the continuous variables and resistant to monotonic ones. Categorical variables are included automatically. We demonstrate the technique on well-known noisy data sets. This paper updates an idea proposed at JSM 2004.
|