JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 486
Type: Invited
Date/Time: Wednesday, August 1, 2012 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract - #303783
Title: Scaling R to Internet Scale Data
Author(s): Karl Millar*+ and Murray Stokely
Companies: Google and Google
Address: 1600 AMPHITHEATRE PKWY, Mountain View, CA, 94043, United States
Keywords: Statistical Computing ; R ; parallelism ; MapReduce

Analyzing internet-scale data sets requires statistical software that scales to data sizes several orders of magnitude larger than R is currently capable of handling. Tools such as MapReduce and Hadoop are capable of scaling to such large data sizes but are impractical for statisticians to use for data analysis.

We will discuss the overall design and API of packages designed to work over Google's distributed computing architecture that help to address these issues.

The foundation of this work is a package that provides an R version of the FlumeJava library of Chambers et al. for distributed computation. This package provides a higher-level abstraction on top of Google's MapReduce framework, providing distributed generic collection classes and simple functions for manipulating them, which are automatically converted into an optimized sequence of MapReduces.

Building on top of this functionality, Google is building additional packages that provide distributed implementations of common statistical algorithms and tools for data analysis on large data sets.

The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program

2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.