|
Activity Number:
|
416
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Wednesday, August 5, 2009 : 8:30 AM to 10:20 AM
|
|
Sponsor:
|
Section on Statistical Computing
|
| Abstract - #305455 |
|
Title:
|
A k-Mean-Directions Algorithm for Clustering Data on the Sphere
|
|
Author(s):
|
Ivan Ramler*+ and Ranjan Maitra
|
|
Companies:
|
St. Lawrence University and Iowa State University
|
|
Address:
|
Valentine 219, Canton, NY, 13617,
|
|
Keywords:
|
directional ; information retrieval ; von Mises ; spkmeans ; Langevin ; k-means
|
|
Abstract:
|
A k-means-type algorithm is proposed for efficiently clustering data constrained to lie on the surface of a p-dimensional unit sphere, or data that are mean-zero-unit-variance standardized observations such as those that occur when using Euclidean distance to cluster time-series gene expression data using a correlation metric. We also provide methodology to initialize the algorithm and to estimate the number of clusters in the data set. Results on detailed series of experiments show excellent performance, even with very large data sets. The methodology is applied to the analysis of the submitted abstracts of oral presentations made at the 2008 Joint Statistical Meetings to identify similar topics.
|