Activity Number:
|
486
- Computing Kaleidoscope
|
Type:
|
Contributed
|
Date/Time:
|
Wednesday, August 1, 2018 : 8:30 AM to 10:12 AM
|
Sponsor:
|
Section on Statistical Computing
|
Abstract #330615
|
Presentation
|
Title:
|
A Transformation-Based K-Means Algorithm for Skewed Data
|
Author(s):
|
Nicholas S Berry* and Ranjan Maitra
|
Companies:
|
Iowa State University and Iowa State University
|
Keywords:
|
k-means;
clustering;
algorithm;
computation;
unsupervised;
transformation
|
Abstract:
|
The k-means algorithm is a well used approach for getting an unsupervised first look at a dataset. While prized mainly because of its simplicity and speed, k-means is generally lacking in flexibility. The most notably inflexible aspects of k-means are its inability to accurately cluster non-spherically clustered data and that it requires a pre-specified number of clusters. This presentation outlines an extension of the k-means algorithm that allows for a parallel estimation of an appropriate transformation for the data and clustering of that data. The implementation is general in the sense that the user passes a parametric transformation to the algorithm, and for that transformation the parameters are estimated for optimal clustering of the dataset. Additionally we extend the attempts of others to estimate the number of clusters on the standard k-means algorithm to our transformation-based approach. Results are shown for a wide variety of datasets demonstrating that the algorithm transforms and clusters well in skewed cases, but also that in situations when standard k-means does well our algorithm has little to no effect on the resulting clustering.
|
Authors who are presenting talks have a * after their name.