JSM 2016 Online Program

Activity Number:	445
Type:	Contributed
Date/Time:	Tuesday, August 2, 2016 : 2:00 PM to 3:50 PM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #320508
Title:	Divide and Recombine (DandR) with Tessera: High-Performance Computing for the Analysis of Big Data and High-Complexity Analytics
Author(s):	Yuying Song* and Bowei Xi and Ryan Hafen and William S. Cleveland
Companies:	Purdue University and Purdue University and Hafen Consulting and Purdue University
Keywords:	Big data ; Divide and Recombine ; computational performance ; Hadoop
Abstract:	The widely used term "big data" carries with it a notion of computational performance for the analysis of big datasets. But for data analysis, computational performance depends very heavily, not just on size, but on the computational complexity of the analytic routines used in the analysis. Datasets that have big computational challenges have a very wide range of sizes. Furthermore, the hardware power available to the data analyst is also an important factor. High performance computing for data analysis can be provided for wide ranges of dataset size, computational complexity, and hardware power by the (D&R) statistical approach, and the Tessera D&R software implementation. Designed experiments for computational performance measurement and analysis reveal important properties of the Tessera computations.

Authors who are presenting talks have a * after their name.