JSM 2016 Online Program

Online Program Home

My Program

Abstract Details

Activity Number:	655
Type:	Contributed
Date/Time:	Thursday, August 4, 2016 : 8:30 AM to 10:20 AM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #318756	View Presentation
Title:	On Assessing the Difficulty of a Clustering Problem: The Introduction of Sensitivity and Specificity to Cluster Analysis
Author(s):	Jonathon O'Brien*
Companies:
Keywords:	clustering ; performance indices ; sensitivity ; specificity ; mixture model ; classification
Abstract:	Historically the field of clustering has been focused on partitioning datasets with little regard for the underlying data generating process. This has created serious unresolved questions regarding the interpretation of clustering results and how algorithms are affected by randomness. We consider the nature of a clustering problem from a statistical perspective, focusing on population level models. From this population based perspective we discuss the difference between classifiers, clusterings and linkage assignments and we propose new indices that put cluster validation into the well known framework of sensitivity and specicity. Dozens of indices have been proposed to compare clusterings but the arguments for selecting one index over another have not been well understood. The framework we propose provides a clear interpretation of the indices and, when tested in a supervised setting, enables researchers to assess the difficulty of their clustering problem. In turn this enables far stronger interpretations of clustering results.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association

Privacy Policy | Conduct Policy | Previous JSMs