Keywords: data science, statistics, intro stats, R, reproducibility, data visualization, data wrangling, git, GitHub, rmarkdown
The introductory statistics course has evolved over the years and taken various forms depending on its target audience. In this talk we discuss a data science course designed to serve as a gateway to the discipline of statistics, the statistics major, and broadly to quantitative studies. The course is intended for an audience of students with little to no computing or statistical background, and focuses on data wrangling, exploratory data analysis, data visualization, and effective communication. Unlike most traditional introductory statistics courses, this course approaches statistics from a model-based perspective and introduces simulation-based and Bayesian inference later in the course. A heavy emphasis is placed on reproducibility (with R Markdown) and version control and collaboration (with git/GitHub). We will discuss in detail the course structure, logistics, and pedagogical considerations as well as give examples from the case studies used in the course. We will also share student feedback, assessment of the success of the course in recruiting students to the statistical science major, and our experience of growing the course from a small seminar course for first-year undergraduates to a larger course open to the entire undergraduate student body.