Conference Program

Return to main conference page

All Times ET

Friday, June 10
Computational Statistics
Cluster and Graphical Analyses
Fri, Jun 10, 11:30 AM - 1:00 PM
Cambria
 

A New Algorithm for Robust Affine-Invariant Clustering (310117)

Andrews T. Anum, The University of Texas at El Paso 
*Michael Pokojovy, The University of Texas at El Paso 

Keywords: cluster analysis, affine invariant, robust

Cluster analysis is an unsupervised machine learning technique commonly employed to partition a dataset into distinct categories referred to as clusters. Cluster analysis has a wide range of applications including, but not limited to, computer science, business and marketing, engineering, cybersecurity, healthcare, bioinformatics, etc. The k-means algorithm is a prominent distance-based clustering method that aims to split a dataset into k groups as to minimize within-cluster sum of squares. Despite overwhelming popularity, the algorithm is neither invariant under non-singular linear transformations, e.g., change of units of measurement, nor robust, i.e., can be unduly influenced by outliers. We develop a new clustering algorithm based on minimizing the negative “trimmed” log-likelihood function proposed by García-Escudero et al. (2008) furnishing an iterative procedure akin to the well-known Lloyd’s algorithm used in k-means. Given any initial partitioning, our C-step iteratively updates the former as to reduce the objective function, thus, converging in a finite number of steps. Being local in its nature, our algorithm relies on multiple suitably chosen initial “warmstarts” to converge to a global minimum. Focusing on lower space dimensions, a simulation study was performed to assess the performance of our proposed algorithm and compare it to that of k-means over a wide variety of scenarios, including various cluster shapes and sizes, number of clusters and type of linear transformations applied. Empirical results strongly indicate that our proposed cluster algorithm has the potential to serve as a robust, affine invariant, yet computationally attractive alternative to the conventional k-means algorithm. In our future investigations, we intend to compare the convergence speed of our algorithm to that of MCLUST by García-Escudero et al. (2008), including higher dimensions p.