Online Program

Return to main conference page
Thursday, May 17
Machine Learning
Optimization
Thu, May 17, 10:30 AM - 12:00 PM
Lake Fairfax B
 

Variable Selection for Consistent Clustering (304545)

Rebecca Nugent, Carnegie Mellon University 
Samuel Ventura, Carnegie Mellon University 
*Ronald Joseph Yurko, Carnegie Mellon University 

Keywords: clustering, variable selection, Adjusted Rand Index, bootstrap

A common problem encountered in clustering analysis is obtaining different clusters for the same data set using different methods. We might be interested in discovering which clusters (if any) are consistent across methods. Similar to the framework of the maximum clustering similarity (MCS) method by Albatineh and Niewiadomska-Bugaj (2011), this paper describes an approach to simultaneously select variables and number of clusters yielding consistent clustering results. Following Raftery and Dean (2006), a greedy search algorithm finds the set of variables and number of clusters with the highest level of consistency as measured by the Hubert-Arabie ARI (1985). Additionally, we address variation and incorporate confidence in our selections through bootstrapping, where the next choice is based on a distributional overlap measure. We present results for both simulated and benchmark clustering data sets.