JSM 2014 Home
Online Program Home
My Program

Abstract Details

Activity Number: 9
Type: Invited
Date/Time: Sunday, August 3, 2014 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Computing
Abstract #310792
Title: Divide and Recombine for Large Complex Data: Statistical Theory for Division and Recombination Methods
Author(s): William S. Cleveland+ and Philip Gautier*
Companies: Purdue University and Purdue University
Keywords: Hadoop ; Visualization ; MapReduce ; Clusters ; Mathematical Statistics ; Statistical Efficiency
Abstract:

Divide & Recombine (D&R) is a statistical approach to analysis of large complex data. The goal is to provide a data analyst with: D&R statistical methods and a D&R computational environment that enable study of large data with almost the same comprehensiveness and detail that we can for small data; analysis using an interactive language for data analysis that is both highly flexible and enables highly time-efficient programming with the data; coupled with the language, a distributed database and parallel compute engine that make computation feasible and practical. In D&R the analyst: divides the data into subsets using a division method; applies analytic methods to each subset; recombines outputs of each analytic method using a recombination method to form a D&R result based on all of the data. Computations can be nearly embarrassingly parallel, the simplest possible parallel processing. The statistical division and recombination methods have an immense impact on the statistical accuracy of the D&R result for an analytic method. A critical research thrust in D&R is developing ``best'' division and recombination methods for analytic methods to optimize statistical accuracy.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2014 program




2014 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Professional Development program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.