Keywords: data science, r, cloud, collaboration
Working with your favorite data science tools in the cloud doesn’t have to be crippling. I want to show some cool models for collaborating (both with yourself and with others) in a cloud based environment across many popular data science tools. Using a basic Amazon Web Services cloud platform, I’ll show how I create and launch customized data science tools (R, Jupyter, Zeppelin, Spark, H2O…), how I choose which tools to use for different projects, and how I conceptualize a data science workflow which can maximize my ability to collaborate at any given step of the process.
Topics to discuss: - Why is collaboration important, and why is reproducible research important? - How collaboration can be fun/easy and not scary/difficult in the cloud - Specifics of David Allen’s plumber package in R - Example models: APIs and port connection management for super-charging your data science abilities with: hybrid tool creation, testing/validation and select forms of deployment