Keywords: spark,streaming,arrow,clusters,realtime,distributed systems,r,rstats
In this talk you will learn how to analyze large datasets, in realtime, from R using Apache Spark through the sparklyr R package. We will briefly introduce Apache Spark, sparklyr and give you a few examples and resources to use dplyr, broom, MLlib and mleap with Spark, from R.
This talk will introduce Structured Streaming in Spark using R, we will discuss various use cases and the supported tools and workflows.
You will also learn how to easily configure Apache Arrow with R on Apache Spark, which will allow you to gain speed improvements and expand the scope of your data science workflows; for instance, by enabling data to be efficiently transferred between your local environment and Apache Spark.