JSM 2014 Home
Online Program Home
My Program

Abstract Details

Activity Number: 9
Type: Invited
Date/Time: Sunday, August 3, 2014 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Computing
Abstract #310890
Title: Clustering and Feature Selection for Big Data
Author(s): Damir Spisic*+ and Jing Shyr and Jing Xu
Companies: IBM and IBM and IBM SPSS
Keywords: Clustering ; Feature Selection ; MapReduce ; Distributed Data ; Mixed Variables ; Data Compression
Abstract:

Clustering algorithms for big data require large computational resources. The process is especially challenging when distributed data sources are considered. Moreover, wrapper type approaches for unsupervised feature selection typically add iterative clustering requirement. Data compression is used as the initial step to address some of these challenges. TwoStep clustering algorithm works by pre-clustering large data sets with mixed categorical and continuous variables in the first step. It uses hierarchical clustering to generate a final solution in the second step. We provide an extension of this algorithm in the MapReduce framework where pre-clustering is executed in parallel on distributed data sources. Partial clustering is introduced to preserve the solution accuracy when merging pre-clustered data from separate data sources. An ensemble approach is used to select features for clustering based on the corresponding selection for each data source. Experimental results comparing the accuracy and performance of this approach to the previous methods are presented and discussed.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2014 program




2014 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Professional Development program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.