Activity Number:

486
 Computing Kaleidoscope

Type:

Contributed

Date/Time:

Wednesday, August 1, 2018 : 8:30 AM to 10:12 AM

Sponsor:

Section on Statistical Computing

Abstract #330615

Presentation

Title:

A TransformationBased KMeans Algorithm for Skewed Data

Author(s):

Nicholas S Berry* and Ranjan Maitra

Companies:

Iowa State University and Iowa State University

Keywords:

kmeans;
clustering;
algorithm;
computation;
unsupervised;
transformation

Abstract:

The kmeans algorithm is a well used approach for getting an unsupervised first look at a dataset. While prized mainly because of its simplicity and speed, kmeans is generally lacking in flexibility. The most notably inflexible aspects of kmeans are its inability to accurately cluster nonspherically clustered data and that it requires a prespecified number of clusters. This presentation outlines an extension of the kmeans algorithm that allows for a parallel estimation of an appropriate transformation for the data and clustering of that data. The implementation is general in the sense that the user passes a parametric transformation to the algorithm, and for that transformation the parameters are estimated for optimal clustering of the dataset. Additionally we extend the attempts of others to estimate the number of clusters on the standard kmeans algorithm to our transformationbased approach. Results are shown for a wide variety of datasets demonstrating that the algorithm transforms and clusters well in skewed cases, but also that in situations when standard kmeans does well our algorithm has little to no effect on the resulting clustering.
