Online Program

Return to main conference page
Saturday, October 21
Knowledge
Sat, Oct 21, 11:45 AM - 1:15 PM
Aventine Ballroom A
Tools for Data Science

Master of Many Machines: Data Science Collaboration Across Jupyter, Zeppelin, and R in the Cloud (303839)

*Kelly O'Briant, B23 

Keywords: data science, r, cloud, collaboration

Working with your favorite data science tools in the cloud doesn’t have to be crippling. I want to show some cool models for collaborating (both with yourself and with others) in a cloud based environment across many popular data science tools. Using a basic Amazon Web Services cloud platform, I’ll show how I create and launch customized data science tools (R, Jupyter, Zeppelin, Spark, H2O…), how I choose which tools to use for different projects, and how I conceptualize a data science workflow which can maximize my ability to collaborate at any given step of the process.

Topics to discuss: - Why is collaboration important, and why is reproducible research important? - How collaboration can be fun/easy and not scary/difficult in the cloud - Specifics of David Allen’s plumber package in R - Example models: APIs and port connection management for super-charging your data science abilities with: hybrid tool creation, testing/validation and select forms of deployment