Activity Number:
|
211
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, August 13, 2002 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistical Computing*
|
Abstract - #301782 |
Title:
|
Clustering Massive Datasets from Arbitrary Gaussian Mixture Populations
|
Author(s):
|
Ranjan Maitra*+
|
Affiliation(s):
|
University of Maryland, Baltimore County
|
Address:
|
1000 Hilltop Circle, Baltimore, Maryland, 21044,
|
Keywords:
|
clustering ; massive datasets ; Gaussian mixtures ; classification
|
Abstract:
|
Clustering is a difficult problem with the level of difficulty compounded for massive datasets. I earlier developed, for homogeneous Gaussian mixtures, a multi-stage algorithm for this problem. This algorithm clustered an initial sample, filtered out observation that can be reasonably classified by these clusters--as per the results of a statistical test--and iterated the procedure on the remainder. The class probabilities and dispersions were finally obtained to classify each observation into the dataset. We heuristically extend the methodology to arbitrary populations of Gaussian mixtures by replacing the statistical testing step with a step that identifies observations in the tails as those having the potential of being from hitherto unidentified clusters. Results on test experiments show promise.
|