All Times ET

Viewing session type: Short Course (half day)

Wednesday, February 17

Wed, Feb 17
10:00 AM - 1:30 PM
Virtual

SC2 - What Would It Take to Change Your Inference? Quantifying the Discourse About Causal Inferences
Short Course (half day)

Instructor(s): Kenneth Frank, Michigan State University

Statistical inferences are often challenged because of uncontrolled bias. There may be bias due to uncontrolled confounding variables or non-random selection into a sample. We turn concerns about potential bias into questions about how much bias there must be to invalidate an inference. For example, challenges such as “But the inference of a treatment effect might not be valid because of pre-existing differences between the treatment groups” are transformed to questions such as “How much bias must there have been due to uncontrolled pre-existing differences to make the inference invalid?” By reframing challenges about bias in terms of specific quantities, this course will contribute to scientific discourse about uncertainty of causal inferences. Critically, while there are other approaches to quantifying the sensitivity of inferences, the approaches presented in this workshop based on correlations of omitted variables (Frank, 2000) and the replacement of cases (Frank and Min, 2007; Frank et al, 2013) have great intuitive appeal. In this sense the techniques provide practicing statisticians a language for communicating with a broad audience about the uncertainty of inferences.

Outline & Objectives

Outline:
In part I, we use Rubin’s causal model to interpret how much bias there must be to invalidate an inference in terms of replacing observed cases with counterfactual cases or cases from an unsampled population (e.g., Frank et al, 2013). In part II, we quantify the robustness of causal inferences in terms of correlations associated with unobserved variables or in unsampled populations (e.g., Frank 2000). Calculations will be presented using the app http://konfound-it.com with links to STATA and R modules. The format will be a mixture of presentation, individual exploration, and group work.

Objectives:
1) Apply and understand techniques for quantifying the robustness of causal inferences.
2) Run macros in STATA or R, Excel, or an on-line app.
3) Develop a deeper understanding of regression and the counterfactual as well as how threats to internal and external validity compare against the strength of evidence.

About the Instructor

Kenneth Frank received his Ph.D. in measurement, evaluation and statistical analysis from the School of Education at the University of Chicago in 1993. He is MSU Foundation professor of Sociometrics, at Michigan State University. His substantive interests include the study of schools as organizations, social structures of students and teachers and school decision-making, and social capital. His substantive areas are linked to several methodological interests: social network analysis, sensitivity analysis and causal inference (http://konfound-it.com), and multi-level models. His methodological work on sensitivity analysis is published in Sociological Methods and Research; Journal of Educational and Behavioral Statistics; Sociological Methodology; and Education, Evaluation and Policy Analysis. The work is widely cited across the social and natural sciences (e.g., Proceedings of the National Academy of Sciences, Administrative Science Quarterly, American Sociological Review, Journal of the Royal Statistical Society).

Relevance to Conference Goals

Quantifying the robustness of an inference in very accessible terms gives the practicing statistician an intuitive language for conveying the uncertainty of an inference. It allows those interacting with the statistician to weigh the strength of evidence relative to concerns about bias and returns on investment. In this sense it provides expressions of evidence to inform practice in a policy or clinical context.

Wed, Feb 17
10:00 AM - 1:30 PM
Virtual

SC3 - Navigating Tough Conversations in Statistical Collaboration
Short Course (half day)

Instructor(s): Julia Sharp, Colorado State University; Zach Weller, Colorado State University

Statistical practitioners face difficult conversations in their interactions with their clients and collaborators. The topics of these conversations vary widely, from completion timelines to the use and interpretation of p-values. While there are no universal guidelines for navigating tough conversations, thoughtful discussion about common experiences and lessons learned; reflection on differences among individuals and situations; and exercises such as role playing can be helpful to prepare and build confidence for engaging in future tough conversations. In this course, we will build participants’ confidence to effectively communicate with clients and customers when challenging topics or situations arise. In this course, we will: (1) Give and solicit examples of difficult conversations often encountered in statistical collaboration, (2) Provide suggestions to approach and engage in these difficult conversations through multiple interactive activities, and (3) Engage participants in the interactive session and learn from each other through discussion, role-playing, and conversations motivated by participants’ questions and videos portraying several difficult conversations.

Outline & Objectives

Welcome/Intro: (20 minutes); Conversation: (1 hour) conversation among instructors and participants to define “difficult conversations”, share experiences, identify participants’ communication strengths, and how they currently manage challenging conversations. Short Break (10 minutes); Focused Discussion: (2 hours) explore specific scenarios through role-playing, discussion, and analyzing videos of meetings between a researcher and statistical collaborator; Closing Discussion: (20 minutes) answer remaining participant questions and reflection. Objectives: (A) Build confidence for engaging in difficult conversations by improving skills for navigating these conversations, while respecting individual differences in communication strategies and professional settings and relationships. (B) Cultivate communication skills for having difficult conversations on both technical and professional topics. (C) Reflect on participant strategies for communication in the context of their career and their job’s expectations. (D) Create a sense of community among participants and start to build a support network for continued discussion and reflection after the course.

About the Instructor

Proposed instructors: Julia Sharp (CSU); Emily Griffith (NCSU); Megan Higgs (Critical Inference LLC); Zach Weller (CSU)

The four instructors for this course have extensive statistical collaboration expertise and PhDs in Statistics. The ASA funded a subset of the instructors to create statistical collaboration training videos for which scenarios of tough conversations between a statistician and a researcher are presented. The instructors will circulate among participants to facilitate and motivate conversations.

Relevance to Conference Goals

A major impact of this short course is the participants’ increased confidence in effective communication with clients and customers when challenging topics or situations arise. The course will build confidence by providing participants with skills and strategies for navigating difficult conversations. The course will provide examples for navigating challenging conversations on both professional and technical topics. The course will also foster a sense of community among participants and start to build a support network, especially for isolated statisticians, for continued discussion.

Wed, Feb 17
10:00 AM - 1:30 PM
Virtual

SC4 - Missing Data Methods for (Un)Commonly Used Statistics
Short Course (half day)

Instructor(s): Emile Latour, Oregon Health & Science University; Miguel Marino, Oregon Health & Science University

Missing data is a challenge that faces all practicing statisticians. Simple ad-hoc methods such as listwise deletion may produce biased results or cause loss in statistical power, leading to incorrect conclusions. This course will describe missing data methods that have been established to overcome these challenges. We will present approaches to visually display patterns of missing data, explain missing data mechanisms, and show how to perform multiple imputation within a regression framework in R/SAS/Stata. We will also present how to adjust statistical software syntax to perform missing data methods on less common statistics that do not have built-in software methods (e.g. kappa statistics, proportion of variance explained, survival probability). We will illustrate methods using electronic health records data. Although the data example is drawn from healthcare, the general methods are transferable to other disciplines. Those with limited experience with missing data will benefit from an introduction to this topic. Experienced practitioners will also benefit to see how missing data methods may be adapted for statistics that cannot be derived from conventional regression models.

Outline & Objectives

This short course will: 1) review missing data mechanisms that result in missing data patterns (e.g. Missing completely at random (MCAR), Missing at Random (MAR), Not Missing at Random (NMAR)), 2) Describe approaches to visualize missing data patterns to be able to communicate to collaborators and help aid decisions about how to approach dealing with the missing data, 3) introduce widely-available methods (e.g. multiple imputation) for dealing with missing data including their strengths and weaknesses, 4) present how to adjust statistical software syntax to be able to perform missing data methods on statistics that are uncommon. This workshop will be valuable to statisticians and budding statisticians in any industry or discipline who work with multivariable data.

About the Instructor

Miguel Marino, PhD is Associate Professor of Biostatistics in the Department of Family Medicine at Oregon Health & Science University (OHSU) with a joint appointment in the OHSU-PSU School of Public Health. Dr. Marino's research focuses on the implementation of novel statistical methodology to address complexities associated with the use of medical electronic health records including issues of missing data. Dr. Marino has co-authored over 125 peer-reviewed publications and has served as co-investigator/site PI in over 20 federally-funded grants from a diverse set of funders (e.g. NIH, CDC, etc.). Dr. Marino currently serves as the Publications Officer for the Health Policy Statistics Section of the ASA and as the statistical editor for the Annals of Family Medicine journal. Co-instructor Emile Latour, MS is an associate biostatistician with the OHSU Knight Cancer Institute where he provides ongoing extensive applied statistical support to a variety of cancer researchers and their projects. In 2018, Emile presented his work on approaches to dealing with missing data for non-traditional statistics at the ASA Conference on Statistical Practice in Portland, OR.

Relevance to Conference Goals

This workshop is relevant to the Implementation and Analysis theme. It is more common for the applied statistician to experience missing data than not. This short course will introduce the applied statistician with simple-to-implement methods to account for missing data that they could encounter in their job. We will focus on multiple imputation for basic statistical models but also introduce multiple imputation for non-standard statistics (e.g. kappa statistics), which are not fully developed in standard software. By our example, we hope to provide guidance and reference for others working on standard non-standard statistics to apply these methods. Through this workshop, we will interpret and adapt established missing data techniques in statistical literature to the practical problems that we faced.

Wed, Feb 17
2:00 PM - 5:30 PM
Virtual

SC6 - Principles of Prediction and Inference in Machine Learning
Short Course (half day)

Instructor(s): Jeffrey D. Blume, Vanderbilt University; Thomas G Stewart, Vanderbilt University School of Medicine

Machine learning and prediction methods are now ubiquitous in popular culture and academic research. While many popular prediction algorithms were developed outside of statistics, statisticians are expected to understand these algorithms, their principals and behavior. In addition, statisticians are often tasked with making inferences in the context of a complex prediction model. The purpose of this short course is to (1) familiarize practitioners with essential principles for prediction and inference tasks using machine learners, (2) explain the reliance on well-defined operating characteristics, particularly out-of-sample optimism and coverage, (3) demonstrate how to compare and contrast the operating characteristics of machine learning and statistical models, (4) promote the habit of using two aligned models, a prediction and inferential model, to meet specific scientific needs. We will emphasize the connection between prediction and attribution, emphasizing that prediction is often an easier task that comes at the expense of the ability to attribute predictive power to a particular feature. Sustained examples with R and group discussion are an integral part of the course

Outline & Objectives

Model-building practices that benefit prediction tasks do not always benefit inferential tasks. And the reverse is also true, making prediction and inference difficult to conduct under a single model. This course is intended to provide a framework for understanding model performance, with special attention to the differences between prediction and inference. The course is organized around the concept of operating characteristics, which is the currency by which models (both prediction and inferential) are often evaluated. We introduce and discuss concepts of out-of-sample predictive accuracy, their estimators via k-fold cross-validation and bootstrapping, and optimism concepts for the prediction setting. For the inference setting we will focus on bias, MSE, testing and interval coverage concepts of estimators that retain meaning in complex models. We demonstrate a general-purpose approach to calculating operating characteristics regardless of the specific family of models e.g., (regularized) regression models, support vector machines, gradient booted models, random forests, and neural networks. Mathematical details will be skipped favor of applied examples using R.

About the Instructor

Dr. Thomas G. Stewart is Assistant professor of Biostatistics and core faculty at the Data Science Institute at Vanderbilt. He developed a brand new computational and re-sampling curriculum for teaching statistics to Data Scientists. He has extensive expertise in prediction models, especially support vector machines and missing data, and regularization models.

Dr. Blume is Vice-Chair for Education in the Department of Biostatistics and Director of Graduate Education at the Data Science Institute. He founded the Data Science Master’s program. His lab focuses on machine learning and prediction, and the role of principles of inference in large-scale settings.

Dr. Mathew Shotwell is Associate Professor of Biostatistics at Vanderbilt University and core faculty at the Data Science Institute. He has taught statistical learning in the Biostatistics graduate program for over 5 years and is currently teaching machine learning in the Data Science program.

Megan Hollister is a PhD student in Dr. Blume’s. She is developing methods for attribution of predictive accuracy to features in complex models. She is also developing an R-package for broad computation of false discovery rates.

Relevance to Conference Goals

The goal of this course is to familiarize practitioners with essential concepts for evaluating prediction models and relate those to critical concepts for inferential tasks. At the conclusion of the course attendees should be able to
(1) Understand the different objectives of prediction and inference models / tasks.
(2) Identify the operating characteristics of primary importance for prediction, similarly for inference
(3) Simulate operating characteristics for simple prediction and inference models.
(4) Recognize the pitfalls of variable selection techniques when constructing models for inference
(5) Distinguish between in-sample and out-of-sample performance and understand the related concept of optimism
These are critical skills for applied statisticians and will help practitioners better interface with machine learners out in the wild.

Wed, Feb 17
2:00 PM - 5:30 PM
Virtual

SC7 - How to Lead Through Change and Build High-Performing Teams
Short Course (half day)

Instructor(s): Angela Demaree, PAWS Consulting, LLC

One day workshop introducing participants to high performance habits, tools, and techniques with a focus on leading through change and building high performing teams.

Participants will leave with simple yet effective tools that can easily be implemented in their personal and professional lives.

Outline & Objectives

Goal: To introduce participants to high performance principles and techniques to reach heightened levels of clarity, energy, courage, productivity, and influence at work and in life with a focus on leading through change and building high performing teams in a professional setting.

Outline:

Understanding High Performance, Boundaries, Values and Burnout.

Clarity: Find Your ‘Why’.

Energy: Tools for maintaining high levels of energy throughout your busy day.

Courage: Overcoming Fear and Overwhelm to lead through change.

Productivity: How do I do it ALL and Lead a Team?

Influence is not a four-letter word. (How to build high performing teams)

Staying positive, keeping an optimistic outlook, leading through change.

Engaging Employees Through Purpose

High Performance Recap

About the Instructor

Dr. Angela Demaree, a veterinarian, and veteran currently serves as the CEO and Principal Consultant for PAWS Consulting, a public health, and political consulting firm. Angela recently retired as a Major in the U.S. Army Reserves and deployed in 2012 in support of Operation Enduring Freedom where she learned strategic planning tools and techniques.

As the Equine Medical Director of the Indiana Horse Racing Commission, she successfully led and managed forty part-time intermittent employees through institutional change. She has her Master of Public Health in Biostatistics and Epidemiology from the University of Southern California’s Keck School of Medicine and is a Certified High Performance Coach.

Angela is a member of the American Statistical Association and currently serves on the Purdue University College of Veterinary Medicine's Alumni Board, the Indiana Animal Health Foundation Board, the Indiana Veterinary Medical Association’s Innovation Task Force, and the Legislative Working Group.

She spends her free time with her horse, Tommy and teaching her Quaker Parrot the Purdue Fight Song. You can connect with Angela on Twitter and LinkedIn @DemareeDVM

Relevance to Conference Goals

Participants will learn key leadership skills they can take back to their organization, build high performing teams, and how to lead through change and uncertainty.

Wed, Feb 17
2:00 PM - 5:30 PM
Virtual

SC8 - Bootstrap Methods and Permutation Tests
Short Course (half day)

Instructor(s): Tim Hesterberg, Google

We begin with a graphical approach to bootstrapping and permutation testing, illuminating basic statistical concepts of standard errors, confidence intervals, p-values and significance tests.

We consider a variety of statistics (mean, trimmed mean, regression, etc.), and a number of sampling situations (one-sample, two-sample, stratified, finite-population), stressing the common techniques that apply in these situations. We'll look at applications from a variety of fields, including telecommunications, finance, and biopharm.

These methods let us do confidence intervals and hypothesis tests when formulas are not available. This lets us do better statistics, e.g. use robust methods (we can use a median or trimmed mean instead of a mean, for example). They can help clients understand statistical variability. And some of the methods are more accurate than standard methods.

Outline & Objectives

Introduction to Bootstrapping
General procedure
Why does bootstrapping work?
Sampling distribution and bootstrap distribution

Bootstrap Distributions and Standard Errors
Distribution of the sample mean
Bootstrap distributions of other statistics
Simple confidence intervals
Two-sample applications

How Accurate Is a Bootstrap Distribution?

Bootstrap Confidence Intervals
Bootstrap percentiles as a check for standard intervals
More accurate bootstrap confidence intervals

Significance Testing Using Permutation Tests
Two-sample applications
Other settings

Wider variety of statistics
Variety of applications
Examples where things go wrong, and what to look for

Wider variety of sampling methods
Stratified sampling, hierarchical sampling
Finite population
Regression
Time series

Participants will learn how to use resampling methods:
* to compute standard errors,
* to check the accuracy of the usual Gaussian-based methods,
* to compute both quick and more accurate confidence intervals,
* for a variety of statistics and
* for a variety of sampling methods, and
* to perform significance tests in some settings.

About the Instructor

Dr. Tim Hesterberg is a Senior Data Scientist at Google. He previously worked at Insightful (S-PLUS), Franklin & Marshall College, and Pacific Gas & Electric Co. He received his Ph.D. in Statistics from Stanford University, under Brad Efron.

Hesterberg wrote "What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum", The American Statistician (2015) (really, that is for every statistician), co-authored Chihara and Hesterberg "Mathematical Statistics with Resampling and R" 2e (Wiley, 2018), and wrote the "Resample" package for R and was primary author of the "S+Resample" package for bootstrapping, permutation tests, jackknife, and other resampling procedures.

Hesterberg is on the executive boards of the National Institute of Statistical Sciences and the Interface Foundation of North America (Interface between Computing Science and Statistics).

He teaches kids to make water bottle rockets, and actively fights climate chaos.
Home page at http://www.timhesterberg.net/bootstrap, and humorous bio is at https://research.google/people/TimHesterberg.

Relevance to Conference Goals

Resampling methods are important in statistical practice, but are omitted or poorly covered in many old-style statistics courses. These methods are an important part of the toolbox of any practicing statistician.

It is important when using these methods to have some understanding of the ideas behind these methods, to understand when they should or should not be used.

They are not a panacea. People tend to think of bootstrapping in small samples, when they don't trust the central limit theorem. However, the common combinations of nonparametric bootstrap and percentile intervals is actually accurate than t procedures. We discuss why, remedies, and better procedures that are only slightly more complicated.

These tools also show how poor common rules of thumb are -- in particular, n >= 30 is woefully inadequate for judging whether t procedures should be OK.

Wed, Feb 17
2:00 PM - 5:30 PM
Virtual

SC9 - Mixed Models: A Critical Tool for Dependent Observations
Short Course (half day)

Instructor(s): Elizabeth Claassen, SAS / JMP; Ruth M Hummel, SAS Institute / JMP Division

The use of fixed and random effects have a rich history. They often go by other names, including blocking models, variance component models, nested and split-plot designs, hierarchical linear models, multilevel models, empirical Bayes, repeated measures, covariance structure models, and random coefficient models. Mixed models are one of the most powerful and practical ways to analyze experimental data, and investing time to become skilled with them is well worth the effort. Many, if not most, real-life data sets do not satisfy the standard statistical assumption of independent observations. Failure to appropriately model design structure can easily result in biased inferences. With an appropriate mixed model we can estimate primary effects of interest as well as compare sources of variability using common forms of dependence among sets of observations. Mixed Models can readily become the most handy method in your analytical toolbox and provide a foundational framework for understanding statistical modeling in general.

In this course we will cover many types of mixed models, including blocking, random coefficients, MLM, repeated measures, spatial models, GLMMs and NLMMs.

Outline & Objectives

This course presents methodology and applications of mixed models. Material is at an applied level, accessible to those familiar with basic ANOVA and regression. We will cover:
1. Why use Mixed Models?
2. ANOVA with a Single Blocking Effect
3. Models with Factorial Treatment Designs
4. Multiple Random Effects
5. Regression, Random Coefficients, and Multilevel Models
6. Repeated Measures and Longitudinal Data
7. Spatial Models
8. Simulation and Power Analysis
9. Generalized Linear and Nonlinear Mixed Models
10. A Modern Take on Mixed Models

About the Instructor

Dr. Elizabeth A. Claassen is Senior Associate Research Statistician Developer in the JMP division of SAS. Dr. Claassen has 9 years’ experience with SAS software and 5 years’ experience with JMP. Her chief interest is generalized linear mixed models, and she brings to this work her expertise with SAS GLM, MIXED, GLIMMIX, and NLMIXED procedures for linear models. Dr. Claassen earned an MS and PhD in statistics from the University of Nebraska–Lincoln, where she received the Holling Family Award for Teaching Excellence from the College of Agricultural Sciences and Natural Resources. She is an author of the third edition of "SAS® for Mixed Models: An Introduction and Basic Applications" (2018).

Dr. Ruth Hummel is an Academic Ambassador with JMP (a division of SAS), supporting the technical needs of professors and instructors who use JMP for teaching and research. Dr. Hummel is an author of "Business Statistics and Analytics in Practice, 9th edition" (2018), and has been teaching and consulting about statistics and analytics for over a decade, at the University of Florida, at the US EPA, and now at SAS/JMP. She has a PhD in Statistics from the Pennsylvania State University.

Relevance to Conference Goals

Our proposed workshop is directly relevant to Theme 3: Implementation and Analysis. We intend to provide participants with the framework to see why mixed models are needed, the tools to correctly adopt this methodology in practice, and experience with comparing incorrectly-built models with appropriately-specified models to understand the impact of correctly applying mixed model methodology in practice.
We will be discussing applications related to:
• Modeling
• Inferential and hypothesis testing
• New packages or procedures
• Implementing reproducible methods
• Evidence-guided statistical practice

Online Program

Viewing session type: Short Course (half day)

American Statistical Association