JSM 2013 Home
Online Program Home
My Program

Abstract Details

Activity Number: 392
Type: Other
Date/Time: Tuesday, August 6, 2013 : 2:00 PM to 3:50 PM
Sponsor: ASA
Abstract - #307035
Title: Divide and Recombine (D&R) with RHIPE for Large Complex Data
Author(s): William S. Cleveland*+
Companies: Purdue Universith
Keywords: big data ; distributed computing ; statistical theory and methods ; data visualization ; Hadoop ; R
Abstract:

D&R is a statistical framework for large complex data. The data are divided into subsets, and numeric and visualization methods are applied to each subset. Then the subset outputs for each analytic method are recombined. D&R computation is embarrassingly parallel: subset computations do not communicate with one another. D&R can exploit a distributed database and parallel compute engine like Hadoop. RHIPE is a merger of R and Hadoop that allows an R user to apply D&R from within R. Almost any R function or package can be used. In one test, logistic regression with 1,073,741,824 rows and 127 predictors (1 terabyte) took 17 min on a modest cluster. The final D&R result for an analytic method is generally not the same as the the direct all-data result that would have occurred had its computation not been infeasible or impractical. In D&R, each analytic method has a division method and a recombination method. These methods have an immense impact on the statistical accuracy of the D&R result. Statistical theory and methods for D&R consists of developing division and recombination methods for analytic methods that are as accurate as possible given that the data must be divided.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2013 program




2013 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.