JSM Preliminary Online Program
This is the preliminary program for the 2006 Joint Statistical Meetings in Seattle, Washington.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2006 Program page




Activity Number: 335
Type: Contributed
Date/Time: Tuesday, August 8, 2006 : 2:00 PM to 3:50 PM
Sponsor: Section on Nonparametric Statistics
Abstract - #307005
Title: Scrambling Method for Cluster Analysis Using Supervised Learning
Author(s): Oksana Shcherbak*+
Companies: Union Bank of California
Address: 2673 Matera Lane, San Diego, CA, 92108,
Keywords: cluster analysis ; CART decision trees ; supervised learning ; multivariate analysis ; data segmentation ; outlier analysis
Abstract:

A general problem of finding high-density regions in the data space, also known as data segmentation, and is an example of unsupervised learning often arises in the field of multivariate analysis. In this paper we present a scrambling technique that allows the use of supervised learning methods such as CART decision trees in unsupervised learning. We found the performance of several CART models for different datasets obtained using the scrambling method to be quite strong and stable for high-dimensional problems. Examining the final models we detected informative behavioral patterns in our population by producing mutually exclusive dependency rules and discovered interesting departures from those resulting to an effective analysis of outliers. We conclude that the technique of scrambling analysis hold promise as a tool enabling effective detection of clusters and outliers in the data.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2006 program

JSM 2006 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised April, 2006