Name: 2018 Joint Statistical Meetings
Start: 2018-07-28T07:00:00+00:00
End: 2018-08-02
Location: Vancouver Convention Centre

Activity Number:	644 - Statistical Computing on Parallel Architectures
Type:	Invited
Date/Time:	Thursday, August 2, 2018 : 10:30 AM to 12:20 PM
Sponsor:	Section on Statistical Computing
Abstract #326566	Presentation
Title:	Deferred Evaluation for Scalable Computing in R
Author(s):	Michael Lawrence*
Companies:	Genentech
Keywords:	R; Parallel computing; Spark; Distributed computing
Abstract:	Typical R workflows load the an entire dataset into memory. When data are large, it is no longer feasible to load all of the data, or even to have all of the data local to the R session. Instead, we need to push computation to the data, which might be located remotely and potentially spread across a computing cluster. We aim to hide this complexity from the user. A general approach is to capture ordinary R code and defer evaluation until the code represents a reduction of the data to a manageable size. We have applied deferred evaluation to separately implement the base R API on top of Solr and Spark. This talk will review those interfaces and discuss the potential for generalization.

Authors who are presenting talks have a * after their name.

JSM 2018 Online Program