JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 402
Type: Contributed
Date/Time: Tuesday, July 31, 2012 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract - #306853
Title: On the Estimation of Similarity Indices in Clustering Evaluation
Author(s): John Ramey*+
Companies: Baylor University
Address: Department of Statistical Science, Waco, TX, 76798-7140, United States
Keywords: unsupervised learning ; clustering evaluation ; clustering similarity ; Rand index ; Jaccard index
Abstract:

The evaluation of clustering algorithms has been argued to be as important as the actual clustering, yet evaluation methods have not been well-studied and are not as straightforward as the evaluation of supervised learning models. In practice, clusters are generally assessed via subjective judgment of visualization tools that are prone to oversimplify geometric structure in the data and can be misleading as well as difficult to interpret. Several proposed evaluation methods have utilized similarity coefficients, such as the Rand and Jaccard indices, to compare clusters from candidate clustering methods, often combined with resampling techniques. We show that the common approach to estimate these similarity coefficients via contingency tables is naive and yields extremely biased estimators, which can lead to invalid conclusions about the determined clusters. We present a Bayesian approach to estimate the similarity coefficients based on a more reasonable likelihood and demonstrate that this alternative approach improves the similarity coefficient estimation, thereby improving the clustering assessment.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program




2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.