|
Activity Number:
|
242
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Tuesday, August 8, 2006 : 8:30 AM to 10:20 AM
|
|
Sponsor:
|
Section on Physical and Engineering Sciences
|
| Abstract - #306211 |
|
Title:
|
K-Means Clustering: a Novel Probabilistic Formulation, with Some Applications
|
|
Author(s):
|
Samiran Ghosh*+ and Dipak Dey
|
|
Companies:
|
University of Connecticut and University of Connecticut
|
|
Address:
|
215 Glenbrook Road, U-4120, Storrs, CT, 06269,
|
|
Keywords:
|
Bayesian computation ; k-means clustering ; Mahalanobis distance ; Markov chain Monte Carlo ; multivariate exponential power family
|
|
Abstract:
|
One of the simplest partition based clustering algorithm is K-means algorithm. It can be shown that the computational complexity of K-means does not suffer from exponential growth with dimensionality. The only crucial requirements are the knowledge of cluster number and computation of some suitably chosen similarity measure. For this simplicity and scalability, K-means remains an attractive alternative when compared to other competing clustering philosophy. However being a deterministic algorithm, traditional K-means have several drawbacks. It only offers hard decision rule, with no probabilistic interpretation. In this paper we have developed a decision theoretic framework by which traditional K-means can be given a probabilistic footstep. This will not only enable us to do a soft clustering rather whole optimization problem could be recasted into Bayesian modeling framework.
|