JSM 2017 Online Program

Activity Number:	460 - Clustering Methods for Big Data Problems
Type:	Topic Contributed
Date/Time:	Wednesday, August 2, 2017 : 8:30 AM to 10:20 AM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #323622	View Presentation
Title:	Efficient Parallelized K-Means for Clustering Big Data
Author(s):	Geoffrey Thompson* and Ranjan Maitra
Companies:	Iowa State University and Iowa State University
Keywords:	k-means ; clustering ; big data
Abstract:	Hartigan and Wong's method for k-means clustering has some advantages in both speed and quality of solution over the commonly-used Lloyd's method. However, the latter is readily done in parallel, which makes it feasible to use on large data sets, while the former is not. We present here a parallelized method based on Hartigan and Wong's.

Authors who are presenting talks have a * after their name.