Abstract:
|
Cloud computing resources - like AWS, GCS, and Azure - provide statisticians and data scientists with limitless data storage and computational resources at relatively low cost and increasingly provide higher-level data analysis and machine learning tools. Yet for R users interested in cloud computing and storage, managing these resources often requires point-and-click interactions with a web console or the installation of constantly evolving command-line tools. The open-source CloudyR Project (http://cloudyr.github.io/) attempts to reconcile increasing demand for high-performance computing, the emergence of R as a first-class data science language, and the difficulty of managing cloud resources in a reproducible manner. CloudyR developers have begun work on native R packages for AWS and GCS platforms. This presentation will give an overview of CloudyR's goals, introduce the dependency-free tools currently available for leveraging AWS and GCS cloud services natively from R, discuss the future plans for the project and how to get involved in development, and reflect on lessons learned from the project for cloud computing and for R development generally.
|