JSM 2011 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Abstract Details

Activity Number: 187
Type: Topic Contributed
Date/Time: Monday, August 1, 2011 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract - #301605
Title: Optimal Variable Weighting in K-Means Clustering
Author(s): Shaonan Zhang*+ and Jiaqiao Hu and Wei Zhu
Companies: The State University of New York at Stony Brook and The State University of New York at Stony Brook and The State University of New York at Stony Brook
Address: Department of Applied Mathematics and Statistics, Stony Brook, NY, 11790,
Keywords: K-means clustering ; variable weights ; optimization ; principal component analysis (PCA)
Abstract:

K-means clustering method is a widely adopted classic clustering algorithm. Weighted K-means clustering is an extension of the K-means clustering in which a set of nonnegative weights are assigned to all the variables. In this paper, we aim to derive the optimal variable weights for weighted k-means clustering in order to obtain more meaningful and interpretable clusters. We further optimized the weighted k-means clustering method (MH Huh, YB Lim 2009) by introducing a new algorithm to obtain global-optimal guaranteed variable weights based on the Karush-Kuhn-Tucker conditions. Here we first present the related theoretical formulation and derivation of the optimal weights. Then we provide an iteration-based computing algorithm to calculate such optimal weights. Numerical examples on both simulated and real data are provided to illustrate our method. It is shown that our method outperforms the original proposed method in terms of classification accuracy and computation efficiency. Finally, a modified solution based on the principal component analysis is proposed to further improve the computational efficiency of K-means clustering for data set with a large number of variables.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2011 program




2011 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.