Abstract #301782

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.

Back to main JSM 2002 Program page

JSM 2002 Abstract #301782
Activity Number: 211
Type: Contributed
Date/Time: Tuesday, August 13, 2002 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing*
Abstract - #301782
Title: Clustering Massive Datasets from Arbitrary Gaussian Mixture Populations
Author(s): Ranjan Maitra*+
Affiliation(s): University of Maryland, Baltimore County
Address: 1000 Hilltop Circle, Baltimore, Maryland, 21044,
Keywords: clustering ; massive datasets ; Gaussian mixtures ; classification

Clustering is a difficult problem with the level of difficulty compounded for massive datasets. I earlier developed, for homogeneous Gaussian mixtures, a multi-stage algorithm for this problem. This algorithm clustered an initial sample, filtered out observation that can be reasonably classified by these clusters--as per the results of a statistical test--and iterated the procedure on the remainder. The class probabilities and dispersions were finally obtained to classify each observation into the dataset. We heuristically extend the methodology to arbitrary populations of Gaussian mixtures by replacing the statistical testing step with a step that identifies observations in the tails as those having the potential of being from hitherto unidentified clusters. Results on test experiments show promise.

  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002