Abstract:
|
Clustering is a tool used throughout science to divide data into meaningful groups. Clusterability is a newer field quantifying the inherent cluster structure in a dataset. The goal of a clusterability test, applied before clustering, is to serve as a validation tool to alert researchers in the event that they have data lacking inherent cluster structure. For such data, deemed "unclusterable", cluster analysis should not be applied. We compare clusterability methods for their ability to identify data as containing -- or, critically, NOT containing -- evidence of multiple inherent clusters. Simulations evaluate type I error and power, as well as behavior for data with small clusters. Methods are applied to real datasets from a variety of different fields including biology, economics, and political science. Finally, we discuss the implementation of clusterability tests in standard software.
|