The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Online Program Home
Abstract Details
Activity Number:
|
354
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, July 31, 2012 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistical Computing
|
Abstract - #306031 |
Title:
|
R and Hadoop: A Streaming Job
|
Author(s):
|
A. Santos*+ and Nathan McIntyre
|
Companies:
|
Return Path and Return Path
|
Address:
|
12150 Race St, Northglenn, CO, 80241, United States
|
Keywords:
|
R ;
Hadoop ;
programming ;
efficiency
|
Abstract:
|
We present an example of using R with Hadoop's streaming capabilities in order to efficiently produce scores for millions of records. We demonstrate both how to set up R to run as a shell script and how to setup a Hadoop streaming job. This poster will display the results of our investigations into precompiled functions, choosing an optimum set size of records to process, the efficiency results from parallel processing, and which R function has the best read performance.
|
The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.
Back to the full JSM 2012 program
|
2012 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Continuing Education program, please contact the Education Department.