Online Program Home
My Program

Abstract Details

Activity Number: 644 - Statistical Computing on Parallel Architectures
Type: Invited
Date/Time: Thursday, August 2, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract #326566 Presentation
Title: Deferred Evaluation for Scalable Computing in R
Author(s): Michael Lawrence*
Companies: Genentech
Keywords: R; Parallel computing; Spark; Distributed computing

Typical R workflows load the an entire dataset into memory. When data are large, it is no longer feasible to load all of the data, or even to have all of the data local to the R session. Instead, we need to push computation to the data, which might be located remotely and potentially spread across a computing cluster. We aim to hide this complexity from the user. A general approach is to capture ordinary R code and defer evaluation until the code represents a reduction of the data to a manageable size. We have applied deferred evaluation to separately implement the base R API on top of Solr and Spark. This talk will review those interfaces and discuss the potential for generalization.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program