Abstract Details
Activity Number:
|
109
|
Type:
|
Invited
|
Date/Time:
|
Monday, August 5, 2013 : 8:30 AM to 10:20 AM
|
Sponsor:
|
Section on Statistical Computing
|
Abstract - #307323 |
Title:
|
Statistical Theory and Methods for the Divide and Recombine (D&R) Statistical Approach to Large Complex Data
|
Author(s):
|
William S. Cleveland*+
|
Companies:
|
Purdue Universith
|
Keywords:
|
parallel computation ;
Hadoop ;
MapReduce ;
cognostics ;
data visualization ;
statististical efficiency
|
Abstract:
|
In D&R, a large complex dataset is divided into subsets, each computationally manageable. Analytic methods, visualization and numeric, are applied to each subset. The outputs of each method are recombined to form a result for all of the data. Subset computations are embarrassingly parallel, no communication among them, which is the simplest parallel processing. The computation is feasible and very practical, as opposed to the direct all-data application of the analytic method. The D&R result is almost always not the same as the hypothetical direct all-data computation, and not as statistically accurate. Still, D&R can have an excellent, very acceptable accuracy. The key element is the division and recombination methods, which determine the accuracy. Statistical thinking, theory, and methods for D&R have the goal of developing "best" D&R methods given that the data must be divided. Computation on subsets as an algorithmic approach is not new. What is new in D&R is that, to achieve fast computation, it does not address the direct all-data computation, and replaces it with a statistically smart choice of the division and recombination methods to achieve high statistical accuracy.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2013 program
|
2013 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Continuing Education program, please contact the Education Department.
The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Copyright © American Statistical Association.