Online Program

Return to main conference page
Friday, May 31
Data Science Techologies
Data Science Platforms: Spark
Fri, May 31, 10:30 AM - 12:05 PM
Grand Ballroom E
 

Scaling Sparklyr with Streams and Arrow (305029)

*Javier Luraschi, RStudio 

Keywords: spark,streaming,arrow,clusters,realtime,distributed systems,r,rstats

In this talk you will learn how to analyze large datasets, in realtime, from R using Apache Spark through the sparklyr R package. We will briefly introduce Apache Spark, sparklyr and give you a few examples and resources to use dplyr, broom, MLlib and mleap with Spark, from R.

This talk will introduce Structured Streaming in Spark using R, we will discuss various use cases and the supported tools and workflows.

You will also learn how to easily configure Apache Arrow with R on Apache Spark, which will allow you to gain speed improvements and expand the scope of your data science workflows; for instance, by enabling data to be efficiently transferred between your local environment and Apache Spark.