All Times ET
Viewing session type: Tutorial
Back to search menu
Friday, February 19
Fri, Feb 19
9:00 AM - 11:00 AM
Virtual
T1 - Regression-Style Modeling with Variable Selection and Reduction
Tutorial
Instructor(s): Clay Barker, SAS / JMP; Ruth M Hummel, SAS Institute / JMP Division
Variable Selection is a crucial step in the model building process, whether we are building a predictive model or trying to understand the results of a designed experiment. Generalized Regression modeling provides a single framework for doing interactive variable selection and fitting generalized linear models. This workshop will start with a brief overview of the generalized linear model for modeling responses that are not necessarily normally distributed. We will also introduce variable selection techniques, including stepwise methods like Forward Selection and penalized regression methods like the Lasso. We close the workshop with examples featuring both observational and experimental data and a variety of response types.
Outline & Objectives
Outline
1. Brief Overview of Generalized Linear Models
2. Intro to Stepwise Variable Selection Methods
3. Intro to Penalized Regression Methods
4. Examples with nonnormal distributions, censoring, multicollinearity, etc.
(a) Performance objectives
By attending this presentation, participants will improve their knowledge of generalized linear models and variable selection techniques. They will also feel comfortable using these methods in software.
(b) Content and instructional methods
The presentation will alternate between the use of slides and software demonstrations. Handouts given to attendees will cover both.
About the Instructor
Dr. Clay Barker is a Senior Research Statistician Developer with JMP (a division of SAS) on a variety of statistical platforms in JMP, including Generalized Regression, Fit Curve and Clustering. He earned his doctorate in statistics from North Carolina State University. He holds several patents, including one for his work on implementing new visualizations for interactive model building in generalized regression.
Dr. Ruth Hummel is an Academic Ambassador with JMP (a division of SAS), supporting the technical needs of professors and instructors who use JMP for teaching and research. Dr. Hummel is a coauthor of Business Statistics and Analytics in Practice, 9th edition (2018), and has been teaching and consulting about statistics and analytics for over a decade, at the University of Florida, at the US Environmental Protection Agency, and now at SAS/JMP. She has a PhD in Statistics from The Pennsylvania State University.
Relevance to Conference Goals
Career Development - Building regression models is a crucial part of data analysis. Sharpening these skills in modern software can be helpful for statisticians in every stage of their career. Performing variable selection in an interactive environment makes it quick and easy to communicate results and assess tradeoffs of different models.
Implementation and Analysis - We will be discussing applications related to:
• Modeling
• Inferential and hypothesis testing
• Predictive analytics
• New packages or procedures
• Analytics, big data, and unstructured data analytic methods
• Machine learning
• Implementing reproducible methods
• Evidence-guided statistical practice
Fri, Feb 19
9:00 AM - 11:00 AM
Virtual
T2 - Bayesian Analytics in Practice
Tutorial
Instructor(s): Sujit Kumar Ghosh, North Carolina State University; Amy Shi, SAS Institute, Inc.
The Bayesian paradigm provides a natural and practical way for building analytical models by expressing complicated models through a sequence of simple conditional models making them useful for simple to complex data structures. This tutorial will begin with a few simple introduction to Bayesian hierarchical models and then expand on more realistic and complex models that have recently emerged within Machine Learning literature. All of these models will be illustrated through practical applications and worked-out examples without getting into the theoretical underpinnings. Participants with basic knowledge of probability theory and statistical inferential framework would find the tutorial useful in expanding their standard toolkit to advanced use of Bayesian analytical methods. The concepts and methods discussed are demonstrated using the various software (R and SAS) developed by the presenters, but they are applicable to any modern Bayesian software package.
Outline & Objectives
Part I - Introduction to Bayesian Hierarchical Models
1. Basic components of Priors, Likelihood and Posterior;
2. Predictive Distributions;
3. Computational Methods using Monte Carlo Simulations
Part II - Primer on R and SAS
1. JAGS through R
2. SAS through PROC MCMC and PROC BGLIMM
Part III – Hierarchical Models in Practice
1. Linear and generalized linear models;
2. Multi-level models;
3. Penalized regression models with missing data
This tutorial aims to familiarize attendees with the essential concepts and computational methods of the Bayesian analytics in the areas where hierarchical modeling is conducted. They will learn how to deal with practical issues that arise from Bayesian analysis, especially those in multilevel modeling. Another major goal is to help attendees become comfortable with using software to conduct Bayesian inference using machine learning models.
About the Instructor
Professor Sujit Kumar Ghosh is currently a Full Professor in the Department of Statistics at North Carolina State University (NCSU). He has over 25 years of experience in conducting, applying, evaluating and documenting statistical analysis of biomedical and environmental data. Prof. Ghosh is actively involved in teaching, supervising and mentoring graduate students at the
doctoral and master levels. He has supervised over 35 doctoral graduate students and recently published a popular book titled "Bayesian Statistical Methods" co-authored with Brian Reich which is being used as a textbook at several universities. He is an elected fellow of the ASA and has also served as the Deputy Director at SAMSI (NC).
Amy Shi is a senior research statistician developer in the Advanced Statistical Methods Department at SAS Institute Inc. Her main responsibility is developing and enhancing the Bayesian capabilities of SAS software, with a focus on generalized linear mixed models, discrete choice models, and multilevel hierarchical settings. She is the developer of the BGLIMM procedure. She has a PhD in biostatistics from the University of North Carolina at Chapel Hill.
Relevance to Conference Goals
Bayesian methods are becoming ever more popular in many applied fields. We have designed the tutorial from a practical point of view, covering a wide range of commonly encountered hierarchical analytical models applied to several different data structures encountered in practice (simple rectangular data structure to complex data structure with missing and/or censored observations). The course emphasizes the practical aspect of Bayesian computational methods using some of industry standard software. The how-to part of the course is presented using a variety of worked-out examples from different applied settings, with code explained in detail.
Fri, Feb 19
9:00 AM - 11:00 AM
Virtual
T3 - Tidyverse Tools in R for Data Science and Statistical Inference
Tutorial
Instructor(s): Chester Ismay, DataRobot; Jessica Minnier, Oregon Health & Science University
Many statisticians use R to clean, manage, and analyze data. Recently, a philosophy promoting tidy data (uniform in shape, one observation per row, one variable per column) and tidy code (readable, consistent across tasks) has risen in popularity. The “tidyverse” is a term for a collection of R packages that embrace this philosophy and are designed to work together to improve readability of code and reproducibility of workflows. The tidyverse is especially accessible to beginners as it allows students to dive into writing tidy code and quickly perform data wrangling, data reshaping, and data visualization tasks. It is also useful in professional data science and statistics as it provides a cohesive architecture for analyzing tidy data. This workshop will introduce tidyverse core and community packages for data processing and visualization (e.g. dplyr, janitor, ggplot2) and showcase new packages (moderndive, infer) designed to extend the tidyverse to a common framework for statistical inference. Using practical examples with large and complex R datasets (e.g. gapminder), we’ll show how to use tidyverse tools to create readable and accessible code for novices and experts alike.
Outline & Objectives
Participants will learn how to tackle the data analysis workflow from start (wrangling, tidying) to finish (visualization, analysis, inference) using tidyverse tools in R. Participants will be able to incorporate any of these tools into their own coding practices as each component can stand alone and also work seamlessly together as a whole. Topics include:
Data Wrangling using the dplyr package
Data Visualization using the ggplot2 package
Data Tidying using the tidyr and janitor packages
Resampling using the moderndive package with dplyr
Statistical inference using the infer package
Intended level: R novices to intermediate R users. Advanced R users may also find interest here as a simplified way to perform statistical inference from a simulation-based approach.
Prerequisites: Some experience with R is helpful. This workshop is interactive, so participants are encouraged to bring a laptop with the latest versions of R and RStudio installed. This workshop will only use free and open-source software.
About the Instructor
Chester leads data science, machine learning, and data engineering in-person workshops as a Data Science Evangelist for DataRobot University with DataRobot. He obtained a PhD in Statistics from Arizona State University and has taught courses and led workshops in mathematics, computer science, statistics, data science, and sociology. He is co-author of the fivethirtyeight, infer, and moderndive R packages and is author of the thesisdown R package. He is also a co-author of Statistical Inference via Data Science: A ModernDive into R and the Tidyverse, an open-source textbook for introductory statistics and data science using R.
Jessica Minnier is an Associate Professor of Biostatistics at OHSU who collaborates with researchers and clinicians in the medical and healthcare fields. She uses R for data cleaning, analysis, and visualization. She has experience teaching statistical methods and programming to graduate students in statistics, public health, biology and medicine, as well as to her statistician peers. She has helped develop R and statistics educational materials, including an interactive R Bootcamp, various recorded R workshops, and interactive R Shiny tutorials.
Relevance to Conference Goals
The tidyverse is designed to provide code that is easily readable, which greatly improves collaboration and communication. These methods can be implemented across many disciplines and can improve programming confidence for beginners. In addition, at the heart of many applied statistician roles is performing hypothesis tests and confidence intervals. The presented moderndive and infer packages extend the tidyverse philosophy to also allow statistical inference to follow a friendly, explicit syntax. This workshop will teach participants a new way to think about these inferential concepts using R packages designed to make their analyses simpler, more reproducible, and more easily modifiable for use in multiple projects. The concepts of bootstrapping and permutation tests, as shown computationally in the infer and moderndive packages, allow for professional development and growth in the field of statistical inference. Connections can also be made to traditional tests like the t-test and ANOVA through use of the infer package and other tidyverse packages. Participants can extend this approach to their own analyses and showcase it in their organization and in their own work.
Fri, Feb 19
9:00 AM - 11:00 AM
Virtual
T4 - Introduction to BlueSky Statistics
Tutorial
Instructor(s): Robert Anthony Muenchen, University of Tennessee
BlueSky Statistics is an open-source graphical user interface to the R language. It uses menus and dialog boxes to manage data, create plots, and analyze data. By default, the R code that it writes is hidden from the user, but can also be displayed and modified before execution.
For non-programmers, BlueSky helps them get their work done. For students of R, it helps to learn R code. For advanced users, it offers a way to convert code into dialogs that allow more effective interaction between programmers and non-programmers within an organization.
Outline & Objectives
1) Importing Data from Excel, SAS, SPSS, Stata, Web Survey tools, etc.
2) Managing Data
a) Computing new variables
b) Conditional transformations
c) Factor level management (uses forcats package)
d) Missing value imputation (uses simputation package)
e) Ranking
f) Recoding
g) Reshaping data (uses tidyr package)
h) Subsetting data
i) Splitting – test/train & group by processing
3) Graphics (uses ggplot2 package)
a) Grammar of Graphics concepts (faceting, etc)
b) Bar charts
c) Density
d) Frequency
e) Histograms
f) Line charts
g) Q-Q plots
h) Scatterplots
4) Basic Analysis
a) Summaries & frequencies
b) Crosstabulation
c) Group comparisons (parametric & non-parametric)
d) Tables – easily generate complex tables of statistics and statistical tests (uses the arsenal package)
5) Model Building
a) Linear regression
b) Logistic regression
c) ANOVA
6) Model Tuning
a) Overview of machine learning methods available
b) Overview of cross-validation methods available
7) Model Statistics – scoring, ROC curves, various metrics & diagnostics
8) Setting options – APA journal style, significant digits, etc.
9) Reproducibility & R Markdown
10) Syntax editor
About the Instructor
Robert A. Muenchen is the author of R for SAS and SPSS Users, and co-author of R for Stata Users and An Introduction to Biomedical Data Science. He is also the creator of r4stats.com, a popular web site devoted to analyzing trends in data science software, reviewing such software, and helping people learn the R language.
Bob is an ASA Accredited Professional Statistician™ who focuses on helping organizations migrate from SAS, SPSS, and Stata to the R Language. He has taught workshops on data science topics for more than 500 organizations and has presented workshops in partnership with the American Statistical Association, RStudio, DataCamp.com, and Revolution Analytics. Bob has written or co-authored over 70 articles published in scientific journals and conference proceedings and has provided guidance on more than 1,000 graduate theses and dissertations at the University of Tennessee.
Relevance to Conference Goals
Many organizations have statisticians and data scientists who are expert programmers. They also have staff members who need to analyze data, but who lack the time or inclination to become good programmers. Software like BlueSky Statistics can help these two types of analysts work together more effectively by using the same R software for their work. Programmers can extend BlueSky’s capabilities with menus and dialogs that control their R code. People wishing to learn R can do so by studying the code that BlueSky writes for them.