JSM 2002

Activity Number:	211
Type:	Contributed
Date/Time:	Tuesday, August 13, 2002 : 10:30 AM to 12:20 PM
Sponsor:	Section on Statistical Computing*
Abstract - #301782
Title:	Clustering Massive Datasets from Arbitrary Gaussian Mixture Populations
Author(s):	Ranjan Maitra*+
Affiliation(s):	University of Maryland, Baltimore County
Address:	1000 Hilltop Circle, Baltimore, Maryland, 21044,
Keywords:	clustering ; massive datasets ; Gaussian mixtures ; classification
Abstract:	Clustering is a difficult problem with the level of difficulty compounded for massive datasets. I earlier developed, for homogeneous Gaussian mixtures, a multi-stage algorithm for this problem. This algorithm clustered an initial sample, filtered out observation that can be reasonably classified by these clusters--as per the results of a statistical test--and iterated the procedure on the remainder. The class probabilities and dispersions were finally obtained to classify each observation into the dataset. We heuristically extend the methodology to arbitrary populations of Gaussian mixtures by replacing the statistical testing step with a step that identifies observations in the tails as those having the potential of being from hitherto unidentified clusters. Results on test experiments show promise.

	Abstract #301782
The views expressed here are those of the individual authors and not necessarily those of the ASA or its board, officers, or staff. Back to main JSM 2002 Program page

Abstract #301782