Online Program Home
My Program

Abstract Details

Activity Number: 497 - Cloud and Distributed Computing for Statisticians
Type: Invited
Date/Time: Wednesday, August 1, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract #326581
Title: Distributed Data Science with Sparklyr
Author(s): Kevin Kuo and Javier Luraschi*
Companies: RStudio
Keywords: distributed computing; big data; rstats; spark; r; machine learning
Abstract:

Recent developments in computing technologies are enabling a wide range of applications from large scale genome analysis to real-time predictive maintenance based on streaming sensor data. However, in many organizations, it has been difficult for statisticians and data scientists using R to leverage computing resources, due to a disconnect in software engineering skillsets. RStudio, along with the open source R community, has been bridging this gap by providing intuitive interfaces to the Apache Spark ecosystem. In this session, we provide an overview of the sparklyr ecosystem and show how it enables cluster computing applications.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program