JSM 2005 - Toronto

Abstract #304314

This is the preliminary program for the 2005 Joint Statistical Meetings in Minneapolis, Minnesota. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2005); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.



The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


The Program has labeled the meeting rooms with "letters" preceding the name of the room, designating in which facility the room is located:

Minneapolis Convention Center = “MCC” Hilton Minneapolis Hotel = “H” Hyatt Regency Minneapolis = “HY”

Back to main JSM 2005 Program page



Legend: = Applied Session, = Theme Session, = Presenter
Activity Number: 65
Type: Contributed
Date/Time: Sunday, August 7, 2005 : 4:00 PM to 5:50 PM
Sponsor: General Methodology
Abstract - #304314
Title: Determining the Number of Clusters in Data via the Weighted Gap Statistic
Author(s): Mingjin Yan*+
Companies: Virginia Polytechnic Institute and State University
Address: 1776 Liberty Ln Apt C34, Blacksburg, VA, 24060, United States
Keywords: cluster analysis ; the gap statistic ; the weighted gap statistic
Abstract:

Estimating the number of clusters in a dataset is a crucial step in cluster analysis. Although a large number of methods have been proposed to deal with this problem, a universally best performer has not been found by far. In this article, we propose new approaches of estimating the optimal number of clusters in data based on the weighted within-clusters sum of errors: a robust measurement of the within-clusters homogeneity. The methods are applicable when the input data contain only continuous measurements and a partitioning clustering method is used. The proposed methods are compared with other existing approaches using both simulated data and real data examples. It shows that the proposed methods are highly effective in determining the true number of clusters in a dataset.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2005 program

JSM 2005 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2005