|
Activity Number:
|
335
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Tuesday, August 8, 2006 : 2:00 PM to 3:50 PM
|
|
Sponsor:
|
Section on Nonparametric Statistics
|
| Abstract - #307005 |
|
Title:
|
Scrambling Method for Cluster Analysis Using Supervised Learning
|
|
Author(s):
|
Oksana Shcherbak*+
|
|
Companies:
|
Union Bank of California
|
|
Address:
|
2673 Matera Lane, San Diego, CA, 92108,
|
|
Keywords:
|
cluster analysis ; CART decision trees ; supervised learning ; multivariate analysis ; data segmentation ; outlier analysis
|
|
Abstract:
|
A general problem of finding high-density regions in the data space, also known as data segmentation, and is an example of unsupervised learning often arises in the field of multivariate analysis. In this paper we present a scrambling technique that allows the use of supervised learning methods such as CART decision trees in unsupervised learning. We found the performance of several CART models for different datasets obtained using the scrambling method to be quite strong and stable for high-dimensional problems. Examining the final models we detected informative behavioral patterns in our population by producing mutually exclusive dependency rules and discovered interesting departures from those resulting to an effective analysis of outliers. We conclude that the technique of scrambling analysis hold promise as a tool enabling effective detection of clusters and outliers in the data.
|