All Times EDT

Friday, June 5

Software and Data Science Technologies 1

Fri, Jun 5, 11:15 AM - 12:50 PM
TBD

Creating Optimal Conditions for Reproducible Data Analysis in R with `Fertile` (308233)

Audrey Margaret Bertin, Smith College
*Benjamin S Baumer, Smith College

Keywords: reproducibility, statistical software, workflow, collaboration

The advancement of scientific knowledge increasingly depends on ensuring that data-driven research is reproducible: that two people with the same data obtain the same results. However, while the necessity of reproducibility is clear, there are significant behavioral and technical challenges that impede its widespread implementation, and no clear consensus on standards of what constitutes reproducibility in published research. We focus on a series of common mistakes programmers make while conducting data science projects in R, primarily through the RStudio integrated development environment. fertile is an R package that operates in two modes: proactively (to prevent reproducibility mistakes from happening in the first place), and retroactively (analyzing code that is already written for potential problems). Furthermore, fertile is designed to educate the user about why the mistakes are problematic, and how to fix them. We discuss experimental results from testing fertile in an introductory data science course.

Online Program

Creating Optimal Conditions for Reproducible Data Analysis in R with `Fertile` (308233)

ASA Meetings Department