Short Courses
SDSS 2021 offered two full-day and two half-day courses. Short courses are ticketed events that require an additional fee.
Full-Day
SC1 – Data Visualization with R
Instructor(s): Aaron Williams, Urban Institute
Data visualization plays a crucial role in the data science and statistics workflows. It is fundamental to everything from exploratory data analysis to communicating results. Data scientists and statisticians can better understand data and more effectively communicate their work by understanding how to better visualize their data. Too often, however, visualization is an afterthought.
In this course, attendees will learn the core principles of data visualization how we perceive visual information; the layered grammar of graphics; and best practices for creating effective visualizations. To put these principles to work, attendees will learn practical skills for R programming that improve the quality of their work and teach them to program away the mundane. The course will focus on the popular R package ggplot2 and the reproducible research framework R Markdown. All R instruction will begin with a clear motivation, followed by an explanation of the approach and code and ending with hands-on examples.
SC2 – Deep Learning in Statistics
Instructor(s): Annie Qu, University of California, Irvine; Xiao Wang, Purdue University; and Edgar Dobriban, University of Pennsylvania
This short course is for those who are new to data science and interested in understanding the cutting-edge machine learning and deep learning models. It is for those who want to become familiar with the core concepts behind these learning algorithms and their successful applications and who want to start thinking about how machine learning and deep learning might be useful in their research, business, or career development. The course will provide a comprehensive overview of statistical machine learning and deep learning methods. Topics include classical methods and modern techniques, including basic machine learning tools, supervised and unsupervised learning, deep neural network, computational algorithms and software of deep learning, and various applications in deep learning.
Half-Day
SC3 – Artificial Intelligence, Machine Learning, and Precision Medicine
Instructor(s): Haoda Fu, Eli Lilly
In this half-day short course, I will provide an overview of statistical machine learning and artificial intelligence techniques, with applications to precision medicine, particularly deriving optimal individualized treatment strategies for personalized medicine. We will cover both treatment selection and treatment transition. The treatment selection framework is based on outcome-weighted classification. We will discuss logistic regression, support vector machine (SVM), ?-learning, robust SVM, and angle-based classifiers for multi-category learning. I will show how to modify these classification methods into outcome-weighted learning algorithms for personalized medicine. The second part of this course will cover treatment transition. I will provide an introduction to reinforcement learning techniques. Algorithms—including dynamic programming for Markov decision process, temporal difference learning, SARSA, Q-Learning algorithms, and actor-critic methods—will be covered. We will discuss how to use these methods for developing optimal treatment transition strategies. The techniques discussed will be demonstrated in R. This course is intended for graduate students who have some knowledge of statistics and want to be introduced to statistical machine learning, or practitioners who would like to apply statistical machine learning techniques to their problems on personalized medicine and other biomedical applications.
SC4 – Data Quality for Data Science and Statistics: A Survey and Practical Application
Instructor(s): Henry Li, Bigeye
Data science and statistics become more important in society every year—as a prime example, consider the sudden influx of public interest in COVID-19 tracking projects such as the tracker from 1Point3Acres. From published research that guides policy to the online predictive systems that set prices and control what we read, high-quality and reliable input data is a necessary (but not sufficient!) condition for quality outcomes.
This half-day course will cover the impact of data quality issues on data science and statistics work, taxonomies of data quality issues that can occur, a survey of current techniques and tools for issue identification, and how to start including data quality techniques in one’s data science work process.