Online Program Home
My Program

Abstract Details

Activity Number: 668 - Best Practices for Programming and Analysis
Type: Contributed
Date/Time: Thursday, August 2, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section for Statistical Programmers and Analysts
Abstract #329614
Title: Adapr: An R Package for an Accountable Data Analysis Process
Author(s): Jonathan Gelfond* and Martin Goros and Brian Hernandez and Alex Bokov
Companies: University of Texas Health San Antonio and UT Health San Antonio and UT Health San Antonio and UT Health San Antonio
Keywords: reproducibility; accountability; R language; version control; data pipeline; collaboration
Abstract:

Efficiently producing transparent analyses may be difficult for beginners or tedious for the experienced. This implies a need for computing systems and environments that not only satisfy reproducibility and accountability standards, but can allow analysts to explore, operate, collaborate on, and refine complex analytical structures. To this end, we have developed an R package and R Shiny application called 'adapr' (Accountable Data Analysis Process in R) that is built on the principle of accountable units. An accountable unit is a data file (statistic, table or graphic) that can be associated with a provenance, meaning when and how it was created and who created it. Accountable units have similarities with the concept of verifiable computational results proposed by Gavish and Donoho. A key element is that an individual accountable unit is sharable and can be incorporated into a collaborative project. Reproducing collaborative work may be highly complex, requiring repeating computations on multiple systems from multiple authors; however, determining the provenance of each unit is simpler, requiring only a search using file hashes and version control systems.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program