Abstract #300756


The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2002 Program page



JSM 2002 Abstract #300756
Activity Number: 258
Type: Contributed
Date/Time: Tuesday, August 13, 2002 : 2:00 PM to 3:50 PM
Sponsor: Biometrics Section*
Abstract - #300756
Title: Distance-based Estimation of the Number of Components in Multivariate Mixture Models: A Tool for Analyzing Gene Expression Data
Author(s): Surajit Ray*+ and Bruce Lindsay
Affiliation(s): Pennsylvania State University and Pennsylvania State University
Address: 325 Thomas Building, University Park, Pennsylvania, 16802, USA
Keywords: Non-parametric Confidence Sets ; Bootstrap ; Statistical Distance Estimation ; Clustering ; Risk Analysis ; Micro-array Data
Abstract:

Multivariate mixture models provide a convenient method of density estimation, model-based clustering, and provide an excellent insight into the actual data generation process. But the problem of choosing the number of components (k) in a statistically meaningful way is still a subject of considerable research . Available methods for estimating k include optimizing AIC and BIC, gradient checking in a nonparametric mixture model setup, and Bayesian approaches with entropy distances. In this paper we present rules for selecting k based on a one-sided non-parametric confidence-set generated by a quadratic distance measure. In this methodology the goal is to find the minimal number of components that are needed to adequately describe the true distribution. We also present results for selecting k based on a risk analysis that includes a penalty for overfitting. The goal here is to find the fitted mixture that is closest to the true distribution. Finally, we fine-tune our methods to analyze gene-expression data from micro-arrays and compare them with other competitive methods.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002