Introductory Overview Lecture: Reproducibility, Efficient Workflows, and Rich Environments — Invited Special Presentation
JSM Partner Societies
Organizer(s): Ryan Tibshirani, Carnegie Mellon University
Chair(s): Jacob Bien, University of Southern California
With computing playing an increasingly central role in statistical research, the proliferation of tools, environments, and languages has increased both the power and the complexity of modern data analysis.
When encountering the results of an analysis, several questions arise:
Are the computations accurate? What results were actually computed? Is the computation robust to changes in the size or structure of the data? How were the model parameters tuned? How do the results change when the parameters are adjusted? Are the results generalizable? Are they reproducible?
This session looks at ways to answer these question, exploring a range of issues surrounding the effective use of computing in statistical research and data analysis. The focus is in particular on getting the most out of rich environments, building efficient workflows, and organizing computations to encourage validity, reproducibility, and collaborative sharing. The session will begin with a framing of the issues and overview of current methods by Victoria Stodden, and then delve into more specific issues in talks by Christopher Genovese and Hadley Wickham.