JSM 2005 - Toronto

Abstract #302921

This is the preliminary program for the 2005 Joint Statistical Meetings in Minneapolis, Minnesota. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2005); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.



The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


The Program has labeled the meeting rooms with "letters" preceding the name of the room, designating in which facility the room is located:

Minneapolis Convention Center = “MCC” Hilton Minneapolis Hotel = “H” Hyatt Regency Minneapolis = “HY”

Back to main JSM 2005 Program page



Legend: = Applied Session, = Theme Session, = Presenter
Activity Number: 520
Type: Contributed
Date/Time: Thursday, August 11, 2005 : 10:30 AM to 12:20 PM
Sponsor: General Methodology
Abstract - #302921
Title: Clustering with Mixed-type Attributes
Author(s): Jong-Min Kim*+ and Seoung-San Chae and William D. Warde
Companies: University of Minnesota and Daejeon University and Oklahoma State University
Address: 2380 Science Building, Morris, MN, 56267, United States
Keywords: Agglomerative Clustering Algorithm ; Rand's C Statistic ; Mixed-type Objects ; Association Coefficient
Abstract:

Agglomerative clustering algorithms are well known in clustering sets of data. However, working on numeric values prohibits it from being used to cluster real-world data containing categorical values. In this research, we present sets of agglomerative clustering algorithms that extend to categorical domains and domains with mixed numeric and categorical values. Simple, matching Jaccard and Yule coefficients are used to measure similarity between objects. Similarities are converted to dissimilarities by using the formula dij = (1 - Sij )^(1/2), where Sij is the differently defined similarity coefficient between i-th and j-th objects. We use a dataset to demonstrate the clustering performance of clustering algorithms with different similarity measures. Our experiments on real-world datasets show agglomerative clustering algorithms are efficient when sets of data with mixed numeric and categorical attributes are clustered. Rand's (1971) C statistic is used to measure the retrieval ability of the agglomerative clustering algorithm with different types of calculating dissimilarity between objects.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2005 program

JSM 2005 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2005