Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 216 - Contributed Poster Presentations: Section on Statistics and Data Science Education
Type: Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistics and Data Science Education
Abstract #314031
Title: Uncovering and Utilizing Data Analysis Workflows: Data Provenance in ISLE
Author(s): Philipp Burckhardt* and Rebecca Nugent and Christopher R Genovese
Companies: Carnegie Mellon University and Carnegie Mellon University and Carnegie Mellon University
Keywords: research; writing; applied statistics; data science; software; tools

The final output of a data analysis project is commonly a report summarizing the main findings. While we hope and assume that the presented results are correct, the finished report provides limited insights into how the sequence of data analysis decisions might have shaped its conclusions (nor highlight any mistakes that might have been made!). The Integrated Statistics Learning Environment (ISLE) supports and can track the entire data analysis workflow from loading and transforming data to exploratory data analysis using statistics and graphs to fitting models to report writing. With all steps integrated into one software tool, it becomes possible to reproduce and replay any data analysis session. Through an interactive history and browsable action log, both the original analyst and external reviewers (e.g. instructors) can characterize and comment on strengths and weaknesses of the data analysis. We will share our student and instructor experience with this data provenance functionality in an introductory statistics course at Carnegie Mellon University.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program