SDSS 2019 will offer two full-day and four half-day courses on Wednesday. Short courses are ticketed events that require an additional fee.
8:00 a.m. – 5:30 p.m.
SC1 - Welcome to the Tidyverse: An Introduction to R for Data Science
Instructor(s): Garrett Grolemund, RStudio
Looking for an effective way to learn R? This one-day course will teach you a workflow for doing data science with the R language. It focuses on using R’s tidyverse, which is a core set of R packages known for their impressive performance and ease of use. We will focus on doing data science, not programming, and you’ll learn to do the following:
- Visualize data with R’s ggplot2 package
- Wrangle data with R’s dplyr package
- Fit models with base R
- Document your work reproducibly with R Markdown
Along the way, you will practice using R’s syntax, gaining comfort with R through many exercises and examples. Bring your laptop! The workshop will be taught by Garrett Grolemund, an award-winning instructor and the co-author of R for Data Science.
SC2 - Modeling in the Tidyverse
Instructor(s): Max Kuhn, RStudio
The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structure. In the last two years, a suite of tidyverse packages has been created that focus on modeling. This course walks through the process of modeling data using these tools. A focus is on modeling for prediction and inference, as well as feature engineering.
8:00 a.m. – 12:00 p.m.
SC3 - Data Visualization: Principles and Applications in R, Tableau, and Python
Instructor(s): Silas Bergen, Winona State University; Todd Iverson, Winona State University
In this course, participants will be introduced to principles of data visualization from foundational literature and implement these principles with hands-on activities using Tableau Public, Python (Altair), and R (ggplot). The course instructors have experience teaching these concepts and content as part of undergraduate statistics and data science curricula and will use example class projects from these courses.
The course will be divided into two modules. Module 1 will cover the principles of data visualization theory, summarizing and illustrating foundational data visualization literature. Module 2 will demonstrate how these principles are applied in various software platforms.
Hands-on data visualization tasks will be employed throughout. Participants must bring their own laptops.
SC4 - Reproducible Research with R
Instructor(s): Kara Woo, Sage Bionetworks
This course will introduce learners to reproducible workflows in R using R Markdown. We will discuss what reproducible research is, why it is important, and what common issues hinder reproducibility. The workshop will guide learners through hands-on exercises in R Markdown and show them how to create reproducible reports and share them on GitHub.
1:30 p.m. – 5:30 p.m.
SC5 - Introduction to Deep Learning
Instructor(s): Kevin Kuo, RStudio; Javier Luraschi, RStudio
This is a practical introduction to neural networks with interactive coding exercises in R. We provide an overview of different types of neural network architectures and how they can be applied in a variety of applications.
SC6 - Text Mining with Tidy Data Principles
Instructor(s): Mara Averick, RStudio; Julia Silge, Stack Overflow
Text data is increasingly important in many domains, and tidy data principles and tidy tools can make text mining easier and more effective. In this short course, learn how to manipulate, summarize, and visualize the characteristics of text using these methods and R packages from the tidy tool ecosystem. These tools are highly effective for many analytical questions and allow analysts to integrate natural language processing into effective workflows already in wide use. Explore how to implement approaches such as sentiment analysis of texts, measuring tf-idf, and building text models.