Online Program

Return to main conference page

All Times ET

Friday, June 4
Education
Data Science Education and Applications
Fri, Jun 4, 1:20 PM - 2:55 PM
TBD
 

Reproducible and collaborative data-science with the RENKU platform (309830)

Presentation

Christine Choirat, Swiss Data Science Center 
*Oksana Riba, EPFL Swiss Data Science Center 
Rok Roskar, Swiss Data Science Center 

Keywords: Reproducibility, Docker, Workflows, Continuous Integration

Communities and funding sources are increasingly demanding reproducibility and FAIRness in scientific work. We present RENKU (https://renkulab.io/): a free and open-source platform for reproducible and collaborative data science developed by the Swiss Data Science Center, a joint initiative of ETH Zürich and EPFL. The platform aims at lowering technical and help users embrace data science best-practises. Namely, RENKU provides a seamless integration of interactive sessions (RStudio, Jupyter, or VSCode); git version control with git LFS to handle data; CI/CD; automatic workflow generation via a command-line tool; a dataset abstraction with flexible metadata and connectors to external data repositories. A knowledge graph, within and across projects, links together all of the above to enable search, discovery and reuse in accordance with the FAIR principles. The knowledge graph can be extended with domain-specific controlled vocabularies to tailor the metadata model to specific use-cases. With RENKU, every step of the data science process that generates new code or data is preserved.

RENKU relies on container technology and provides base images that can leverage existing Docker resources, for example those created and maintained by the Rocker and Bioconductor communities. RENKU also lets users further customize the software tools they want to use, for example with a focus on data visualization, GIS, deep learning, or bioinformatics to name a few.

The RENKU platform can be deployed with Kubernetes on public clouds or on-premise infrastructure. All of the software, including deployment recipes, is open-source and publicly available. User projects are simply git repositories (with bells and whistles added in), which ensures that there is no “vendor lock-in” - users can take their project anywhere where they have access to git, should they choose to no longer use RENKU.