Online Program Home
My Program

Abstract Details

Activity Number: 486 - Computing Kaleidoscope
Type: Contributed
Date/Time: Wednesday, August 1, 2018 : 8:30 AM to 10:12 AM
Sponsor: Section on Statistical Computing
Abstract #330615 Presentation
Title: A Transformation-Based K-Means Algorithm for Skewed Data
Author(s): Nicholas S Berry* and Ranjan Maitra
Companies: Iowa State University and Iowa State University
Keywords: k-means; clustering; algorithm; computation; unsupervised; transformation

The k-means algorithm is a well used approach for getting an unsupervised first look at a dataset. While prized mainly because of its simplicity and speed, k-means is generally lacking in flexibility. The most notably inflexible aspects of k-means are its inability to accurately cluster non-spherically clustered data and that it requires a pre-specified number of clusters. This presentation outlines an extension of the k-means algorithm that allows for a parallel estimation of an appropriate transformation for the data and clustering of that data. The implementation is general in the sense that the user passes a parametric transformation to the algorithm, and for that transformation the parameters are estimated for optimal clustering of the dataset. Additionally we extend the attempts of others to estimate the number of clusters on the standard k-means algorithm to our transformation-based approach. Results are shown for a wide variety of datasets demonstrating that the algorithm transforms and clusters well in skewed cases, but also that in situations when standard k-means does well our algorithm has little to no effect on the resulting clustering.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program