Activity Number:
|
216
- Contributed Poster Presentations: Section on Statistics and Data Science Education
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
|
Sponsor:
|
Section on Statistics and Data Science Education
|
Abstract #314031
|
|
Title:
|
Uncovering and Utilizing Data Analysis Workflows: Data Provenance in ISLE
|
Author(s):
|
Philipp Burckhardt* and Rebecca Nugent and Christopher R Genovese
|
Companies:
|
Carnegie Mellon University and Carnegie Mellon University and Carnegie Mellon University
|
Keywords:
|
research;
writing;
applied statistics;
data science;
software;
tools
|
Abstract:
|
The final output of a data analysis project is commonly a report summarizing the main findings. While we hope and assume that the presented results are correct, the finished report provides limited insights into how the sequence of data analysis decisions might have shaped its conclusions (nor highlight any mistakes that might have been made!). The Integrated Statistics Learning Environment (ISLE) supports and can track the entire data analysis workflow from loading and transforming data to exploratory data analysis using statistics and graphs to fitting models to report writing. With all steps integrated into one software tool, it becomes possible to reproduce and replay any data analysis session. Through an interactive history and browsable action log, both the original analyst and external reviewers (e.g. instructors) can characterize and comment on strengths and weaknesses of the data analysis. We will share our student and instructor experience with this data provenance functionality in an introductory statistics course at Carnegie Mellon University.
|
Authors who are presenting talks have a * after their name.