Keywords: cloud computing, open infrastructure, open source
As much of our modern computational workflows move to the cloud, new opportunities open up to use open-source tooling for scientific interactivity, collaboration, and reproducibility. While much attention has been paid to specific analytics packages in the open-source ecosystem, less thought has been put into how these packages will be *run* in the cloud. It is crucial that the open-source scientific stack also has a collection of open-source and cloud-agnostic tools for managing resources and environments in the cloud, something we call the “Open Infrastructure Stack”.
This talk will describe recent projects in “Open Infrastructure” from the Jupyter community. These are cloud-native projects that enable open-source workflows in data analytics, scientific analysis, and education. Importantly, they are community-driven and cloud-agnostic tools; these projects focus on giving users the flexibility to run infrastructure on a number of different cloud providers or institutional hardware.
The centerpiece of this stack is JupyterHub, a tool for deploying interactive analytics environments that are remotely accessible. JupyterHub is workflow- and cloud-agnostic, and is able to serve environments with many languages, interfaces, and computational infrastructure. JupyterHub distributions integrate with other open infrastructure (such as Kubernetes), and can be used for both smaller (tens of users) as well as larger (1000s of users) deployments.
We’ll cover the structure and function of several marquee deployments of JupyterHub that span major use-cases in science and education. We will describe the process and technical deployment around Data 8, a 1,400 student course with a JupyterHub that all students use for doing their work. We’ll also cover Pangeo, a collaboration between multiple institutes and projects that uses a JupyterHub along with an open-source stack to facilitate large-scale geospatial analysis in the cloud.