JSM 2013 Home
Online Program Home
My Program

Abstract Details

Activity Number: 109
Type: Invited
Date/Time: Monday, August 5, 2013 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract - #307323
Title: Statistical Theory and Methods for the Divide and Recombine (D&R) Statistical Approach to Large Complex Data
Author(s): William S. Cleveland*+
Companies: Purdue Universith
Keywords: parallel computation ; Hadoop ; MapReduce ; cognostics ; data visualization ; statististical efficiency
Abstract:

In D&R, a large complex dataset is divided into subsets, each computationally manageable. Analytic methods, visualization and numeric, are applied to each subset. The outputs of each method are recombined to form a result for all of the data. Subset computations are embarrassingly parallel, no communication among them, which is the simplest parallel processing. The computation is feasible and very practical, as opposed to the direct all-data application of the analytic method. The D&R result is almost always not the same as the hypothetical direct all-data computation, and not as statistically accurate. Still, D&R can have an excellent, very acceptable accuracy. The key element is the division and recombination methods, which determine the accuracy. Statistical thinking, theory, and methods for D&R have the goal of developing "best" D&R methods given that the data must be divided. Computation on subsets as an algorithmic approach is not new. What is new in D&R is that, to achieve fast computation, it does not address the direct all-data computation, and replaces it with a statistically smart choice of the division and recombination methods to achieve high statistical accuracy.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2013 program




2013 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.