JSM 2004 - Toronto

Abstract #302010

This is the preliminary program for the 2004 Joint Statistical Meetings in Toronto, Canada. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2004); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2004 Program page



Activity Number: 90
Type: Contributed
Date/Time: Monday, August 9, 2004 : 9:00 AM to 10:50 AM
Sponsor: Biometrics Section
Abstract - #302010
Title: A Gap-statistic-based Method of Determining the Number of Clusters
Author(s): Mingjin Yan*+
Companies: Virginia Polytechnic Institute and State University
Address: 1776 Liberty Lane Apt. C34, Blacksburg, VA, 24060,
Keywords: Gap statistic ; K-means ; clustering
Abstract:

Cluster analysis has attracted increasing attention as an exploratory tool in the "unsupervised" learning of gene expression data. Currently, the major concerns in clustering analysis lie in two aspects: what clustering method should be implemented and how to determine the number of clusters in a dataset. In spite of its importance, making inference about the correct group number in a dataset is not an easy task due to no clear definition of cluster." Widely used clustering algorithms (e.g., K-means or hierarchical) give heuristic but no determined results about the number of groups. Tibshirani et al. proposed a method which formalized the estimation of the number of data clusters via the Gap statistic. A new technique based on the Gap statistic is developed to find the optimal cluster number. Performances of this new method and the Gap statistic are compared with both simulated and real-world datasets.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2004 program

JSM 2004 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2004