Online Program Home
My Program

Abstract Details

Activity Number: 655
Type: Contributed
Date/Time: Thursday, August 4, 2016 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #318756 View Presentation
Title: On Assessing the Difficulty of a Clustering Problem: The Introduction of Sensitivity and Specificity to Cluster Analysis
Author(s): Jonathon O'Brien*
Keywords: clustering ; performance indices ; sensitivity ; specificity ; mixture model ; classification

Historically the field of clustering has been focused on partitioning datasets with little regard for the underlying data generating process. This has created serious unresolved questions regarding the interpretation of clustering results and how algorithms are affected by randomness. We consider the nature of a clustering problem from a statistical perspective, focusing on population level models. From this population based perspective we discuss the diff erence between classifi ers, clusterings and linkage assignments and we propose new indices that put cluster validation into the well known framework of sensitivity and speci city. Dozens of indices have been proposed to compare clusterings but the arguments for selecting one index over another have not been well understood. The framework we propose provides a clear interpretation of the indices and, when tested in a supervised setting, enables researchers to assess the difficulty of their clustering problem. In turn this enables far stronger interpretations of clustering results.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association