All Times EDT
Keywords: reproducibility, data pipelines, R
Achieving computational reproducibility within data science pipelines is a dynamic, shifting task. Package development for data science is happening at a very rapid speed, both in R and python, the two main scripting languages for Data Science. This means, that an implemented data pipeline might produce different results due to a change in the underlying dependencies. Focusing on the R software we propose a paradigm for managing computational reproducibility that assists users in not only identifying when a package's functionality has changed, but also identifies whether that change will impact the results of a user's project code.