JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 174
Type: Contributed
Date/Time: Monday, July 30, 2012 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract - #305991
Title: Model-Based Clustering Analysis of Large Climate Simulation Data Sets
Author(s): Wei-Chen Chen*+ and George Ostrouchov and David Pugmire and Mr Prabhat and Michael Wehner
Companies: Oak Ridge National Laboratory and Oak Ridge National Laboratory and Oak Ridge National Laboratory and Lawrence Berkeley National Laboratory and Lawrence Berkeley National Laboratory
Address: , Oak Ridge, TN, 37830, United States
Keywords: model-based clustering ; unsupervised learning ; EM ; APECM ; CAM ; SPMD

We develop a parallel expectation-maximization (EM) algorithm for model-based clustering, utilizing high-performance computing techniques. We utilize the single program multiple data (SPMD) programming model to reduce communication between processors. Our parallel EM algorithm scales for clustering ultra-large (hundreds of terabytes) datasets. We can apply the same technique for improving the scalability of EM-alike algorithms, such as AECM and APECM. Moreover, these parallel algorithms are easily generalized for optimizing other finite mixture models. We demonstrate the performance of our parallel algorithm on a high resolution climate dataset produced by the community atmosphere model (CAM5). An accompanying R package 'pmclust', for parallel model-based clustering is released on CRAN.

The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program

2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.