Online Program

Return to main conference page
Thursday, October 19
Thu, Oct 19, 3:50 PM - 5:00 PM
Aventine Ballroom E
Speed Session 2

Jupyter Notebook for Reproducible Research: A Case Study - Predicting Breast Cancer Survivability (303910)

Cynthia Hudson Vitale, Washington University 
Thomas Burroughs, Saint Louis University 
Anthony Juehne, Washington University 
Leslie D McIntosh, Washington University 
*Lorinette Saphire Wirth, Saint Louis University 
Connie Zabarovskaya, Washington University 

Keywords: Reproducible Research, Breast Cancer, Data Science, Machine Learning, Jupyter Notebook, SEER

Reproducibility is a main component of the scientific method; yet, irreproducible research is common in Science. The paucity of reproducible research raises questions of validity, quality, and overall transparency in scientific research. Arguably, it may be partially responsible for public distrust in Science. Reproducible research makes data and code accessible alongside study results so data analyses can be successfully repeated. However, there are several practical barriers to reproducible research including the lack of knowledge. Computational notebooks may be utilized to reduce barriers to reproducible research. Jupyter Notebook is an open-source, web-based environment for writing text, executing code, and displaying output creating a clean ‘publication’. As a case study in reproducible research, we utilize the R software to predict breast cancer survivability using data science techniques in an attempt to reproduce an earlier study (bit.ly/2ovDfEq). We aim to: 1) Illustrate Jupyter Notebook as a tool to make text, code, and results transparent; and 2) Understand current breast cancer survivability using public data.