All Times ET

Wednesday, February 17

Wed, Feb 17
10:00 AM - 5:30 PM
Virtual

SC1 - The Productive Practitioner
Short Course (full day)

Instructor(s): David Shilane, Columbia University

Statistical practices can be greatly enhanced through an effective use of software and advanced programming techniques. This course is designed to provide targeted instruction to help practitioners learn to become more productive in analyzing data and developing projects. The class will be taught in the R statistical programming language.

Through four modules, the course will teach students to analyze large data sets, adopt reproducible methods for research and reporting, implement the best practices of software design, and generate dynamic applications in a web-style interface. Merging concepts and applications, the curriculum will touch on statistical problems in a variety of industries and develop skills in statistical consulting.

The course is designed to help the participants to improve the accuracy, transparency, and efficiency of their work. These benefits accrue to both the practitioners and their organizations, empowering greater accountability and use of information in collaborations. By developing these skills in programming and analysis, you can become a more productive practitioner.

Outline & Objectives

The full-day course will consist of the following modules.

1. Analyzing Large (and Small) Data Sets with R's data.table Package: Introducing basic and advanced techniques for processing data using data.table’s efficience methods.

2. Reproducible Research: Summarizing the important concepts in generating reproducible reports, highlighting advanced uses of the Rmarkdown package, and demonstrating the improvements in productivity and accuracy associated with reproducible practices.

3. Software Design and Productivity: Showcasing the best practices from computer science in writing code and creating flexible designs to anticipate changes in the goals of a project.

4. Dynamic Reporting Engines with R's shiny Package: Developing user interfaces and reactive code to display customized content in a web-friendly design.

Each of these modules will focus on a number of core skills:

a) Writing effective code.

b) Anticipating the challenges of working with data in iterative projects.

c) Creating customized content for a wide range of audiences.

d) Utilizing these techniques in statistical practice and collaboration with others.

About the Instructor

David Shilane is a member of the full-time faculty of Columbia University’s Applied Analytics program. He teaches master’s level courses in machine learning, research methods, and data science consulting. As a practitioner, he has collaborated with numerous companies and organizations to design data-driven products and applications, build analytical infrastructures, and inform strategic decisions. He has worked in diverse fields including online dating, product recommendations, public health initiatives, and educational technology. David is excited to convey these experiences — from statistical methods and technical implementations to new discoveries derived from data — both inside and outside of the classroom. He has published research in a variety of fields, including medicine and public health, educational technology, applied statistics, optimization methods, and statistical software. He received a PhD in Biostatistics from the University of California Berkeley and prior degrees from Stanford University.

Relevance to Conference Goals

The conference is designed to facilitate new learning and career development for statistical practitioners. The proposed course is designed to develop the participants’ range and abilities in using statistical software. The courses are designed to build competencies and introduce advanced uses. Much of this material is based on a data science consulting curriculum developed by the instructor based upon the lessons learned as a practitioner. While the topics for the course focus on software design and statistical program, they will be introduced in a context that will enhance the participant’s knowledge of new domains, reinforce concepts from statistical analysis, and develop managerial skills that will be useful in statistical consulting.

Wed, Feb 17
10:00 AM - 1:30 PM
Virtual

SC2 - What Would It Take to Change Your Inference? Quantifying the Discourse About Causal Inferences
Short Course (half day)

Instructor(s): Kenneth Frank, Michigan State University

Statistical inferences are often challenged because of uncontrolled bias. There may be bias due to uncontrolled confounding variables or non-random selection into a sample. We turn concerns about potential bias into questions about how much bias there must be to invalidate an inference. For example, challenges such as “But the inference of a treatment effect might not be valid because of pre-existing differences between the treatment groups” are transformed to questions such as “How much bias must there have been due to uncontrolled pre-existing differences to make the inference invalid?” By reframing challenges about bias in terms of specific quantities, this course will contribute to scientific discourse about uncertainty of causal inferences. Critically, while there are other approaches to quantifying the sensitivity of inferences, the approaches presented in this workshop based on correlations of omitted variables (Frank, 2000) and the replacement of cases (Frank and Min, 2007; Frank et al, 2013) have great intuitive appeal. In this sense the techniques provide practicing statisticians a language for communicating with a broad audience about the uncertainty of inferences.

Outline & Objectives

Outline:
In part I, we use Rubin’s causal model to interpret how much bias there must be to invalidate an inference in terms of replacing observed cases with counterfactual cases or cases from an unsampled population (e.g., Frank et al, 2013). In part II, we quantify the robustness of causal inferences in terms of correlations associated with unobserved variables or in unsampled populations (e.g., Frank 2000). Calculations will be presented using the app http://konfound-it.com with links to STATA and R modules. The format will be a mixture of presentation, individual exploration, and group work.

Objectives:
1) Apply and understand techniques for quantifying the robustness of causal inferences.
2) Run macros in STATA or R, Excel, or an on-line app.
3) Develop a deeper understanding of regression and the counterfactual as well as how threats to internal and external validity compare against the strength of evidence.

About the Instructor

Kenneth Frank received his Ph.D. in measurement, evaluation and statistical analysis from the School of Education at the University of Chicago in 1993. He is MSU Foundation professor of Sociometrics, at Michigan State University. His substantive interests include the study of schools as organizations, social structures of students and teachers and school decision-making, and social capital. His substantive areas are linked to several methodological interests: social network analysis, sensitivity analysis and causal inference (http://konfound-it.com), and multi-level models. His methodological work on sensitivity analysis is published in Sociological Methods and Research; Journal of Educational and Behavioral Statistics; Sociological Methodology; and Education, Evaluation and Policy Analysis. The work is widely cited across the social and natural sciences (e.g., Proceedings of the National Academy of Sciences, Administrative Science Quarterly, American Sociological Review, Journal of the Royal Statistical Society).

Relevance to Conference Goals

Quantifying the robustness of an inference in very accessible terms gives the practicing statistician an intuitive language for conveying the uncertainty of an inference. It allows those interacting with the statistician to weigh the strength of evidence relative to concerns about bias and returns on investment. In this sense it provides expressions of evidence to inform practice in a policy or clinical context.

Wed, Feb 17
10:00 AM - 1:30 PM
Virtual

SC3 - Navigating Tough Conversations in Statistical Collaboration
Short Course (half day)

Instructor(s): Julia Sharp, Colorado State University; Zach Weller, Colorado State University

Statistical practitioners face difficult conversations in their interactions with their clients and collaborators. The topics of these conversations vary widely, from completion timelines to the use and interpretation of p-values. While there are no universal guidelines for navigating tough conversations, thoughtful discussion about common experiences and lessons learned; reflection on differences among individuals and situations; and exercises such as role playing can be helpful to prepare and build confidence for engaging in future tough conversations. In this course, we will build participants’ confidence to effectively communicate with clients and customers when challenging topics or situations arise. In this course, we will: (1) Give and solicit examples of difficult conversations often encountered in statistical collaboration, (2) Provide suggestions to approach and engage in these difficult conversations through multiple interactive activities, and (3) Engage participants in the interactive session and learn from each other through discussion, role-playing, and conversations motivated by participants’ questions and videos portraying several difficult conversations.

Outline & Objectives

Welcome/Intro: (20 minutes); Conversation: (1 hour) conversation among instructors and participants to define “difficult conversations”, share experiences, identify participants’ communication strengths, and how they currently manage challenging conversations. Short Break (10 minutes); Focused Discussion: (2 hours) explore specific scenarios through role-playing, discussion, and analyzing videos of meetings between a researcher and statistical collaborator; Closing Discussion: (20 minutes) answer remaining participant questions and reflection. Objectives: (A) Build confidence for engaging in difficult conversations by improving skills for navigating these conversations, while respecting individual differences in communication strategies and professional settings and relationships. (B) Cultivate communication skills for having difficult conversations on both technical and professional topics. (C) Reflect on participant strategies for communication in the context of their career and their job’s expectations. (D) Create a sense of community among participants and start to build a support network for continued discussion and reflection after the course.

About the Instructor

Proposed instructors: Julia Sharp (CSU); Emily Griffith (NCSU); Megan Higgs (Critical Inference LLC); Zach Weller (CSU)

The four instructors for this course have extensive statistical collaboration expertise and PhDs in Statistics. The ASA funded a subset of the instructors to create statistical collaboration training videos for which scenarios of tough conversations between a statistician and a researcher are presented. The instructors will circulate among participants to facilitate and motivate conversations.

Relevance to Conference Goals

A major impact of this short course is the participants’ increased confidence in effective communication with clients and customers when challenging topics or situations arise. The course will build confidence by providing participants with skills and strategies for navigating difficult conversations. The course will provide examples for navigating challenging conversations on both professional and technical topics. The course will also foster a sense of community among participants and start to build a support network, especially for isolated statisticians, for continued discussion.

Wed, Feb 17
10:00 AM - 1:30 PM
Virtual

SC4 - Missing Data Methods for (Un)Commonly Used Statistics
Short Course (half day)

Instructor(s): Emile Latour, Oregon Health & Science University; Miguel Marino, Oregon Health & Science University

Missing data is a challenge that faces all practicing statisticians. Simple ad-hoc methods such as listwise deletion may produce biased results or cause loss in statistical power, leading to incorrect conclusions. This course will describe missing data methods that have been established to overcome these challenges. We will present approaches to visually display patterns of missing data, explain missing data mechanisms, and show how to perform multiple imputation within a regression framework in R/SAS/Stata. We will also present how to adjust statistical software syntax to perform missing data methods on less common statistics that do not have built-in software methods (e.g. kappa statistics, proportion of variance explained, survival probability). We will illustrate methods using electronic health records data. Although the data example is drawn from healthcare, the general methods are transferable to other disciplines. Those with limited experience with missing data will benefit from an introduction to this topic. Experienced practitioners will also benefit to see how missing data methods may be adapted for statistics that cannot be derived from conventional regression models.

Outline & Objectives

This short course will: 1) review missing data mechanisms that result in missing data patterns (e.g. Missing completely at random (MCAR), Missing at Random (MAR), Not Missing at Random (NMAR)), 2) Describe approaches to visualize missing data patterns to be able to communicate to collaborators and help aid decisions about how to approach dealing with the missing data, 3) introduce widely-available methods (e.g. multiple imputation) for dealing with missing data including their strengths and weaknesses, 4) present how to adjust statistical software syntax to be able to perform missing data methods on statistics that are uncommon. This workshop will be valuable to statisticians and budding statisticians in any industry or discipline who work with multivariable data.

About the Instructor

Miguel Marino, PhD is Associate Professor of Biostatistics in the Department of Family Medicine at Oregon Health & Science University (OHSU) with a joint appointment in the OHSU-PSU School of Public Health. Dr. Marino's research focuses on the implementation of novel statistical methodology to address complexities associated with the use of medical electronic health records including issues of missing data. Dr. Marino has co-authored over 125 peer-reviewed publications and has served as co-investigator/site PI in over 20 federally-funded grants from a diverse set of funders (e.g. NIH, CDC, etc.). Dr. Marino currently serves as the Publications Officer for the Health Policy Statistics Section of the ASA and as the statistical editor for the Annals of Family Medicine journal. Co-instructor Emile Latour, MS is an associate biostatistician with the OHSU Knight Cancer Institute where he provides ongoing extensive applied statistical support to a variety of cancer researchers and their projects. In 2018, Emile presented his work on approaches to dealing with missing data for non-traditional statistics at the ASA Conference on Statistical Practice in Portland, OR.

Relevance to Conference Goals

This workshop is relevant to the Implementation and Analysis theme. It is more common for the applied statistician to experience missing data than not. This short course will introduce the applied statistician with simple-to-implement methods to account for missing data that they could encounter in their job. We will focus on multiple imputation for basic statistical models but also introduce multiple imputation for non-standard statistics (e.g. kappa statistics), which are not fully developed in standard software. By our example, we hope to provide guidance and reference for others working on standard non-standard statistics to apply these methods. Through this workshop, we will interpret and adapt established missing data techniques in statistical literature to the practical problems that we faced.

Wed, Feb 17
2:00 PM - 5:30 PM
Virtual

SC6 - Principles of Prediction and Inference in Machine Learning
Short Course (half day)

Instructor(s): Jeffrey D. Blume, Vanderbilt University; Thomas G Stewart, Vanderbilt University School of Medicine

Machine learning and prediction methods are now ubiquitous in popular culture and academic research. While many popular prediction algorithms were developed outside of statistics, statisticians are expected to understand these algorithms, their principals and behavior. In addition, statisticians are often tasked with making inferences in the context of a complex prediction model. The purpose of this short course is to (1) familiarize practitioners with essential principles for prediction and inference tasks using machine learners, (2) explain the reliance on well-defined operating characteristics, particularly out-of-sample optimism and coverage, (3) demonstrate how to compare and contrast the operating characteristics of machine learning and statistical models, (4) promote the habit of using two aligned models, a prediction and inferential model, to meet specific scientific needs. We will emphasize the connection between prediction and attribution, emphasizing that prediction is often an easier task that comes at the expense of the ability to attribute predictive power to a particular feature. Sustained examples with R and group discussion are an integral part of the course

Outline & Objectives

Model-building practices that benefit prediction tasks do not always benefit inferential tasks. And the reverse is also true, making prediction and inference difficult to conduct under a single model. This course is intended to provide a framework for understanding model performance, with special attention to the differences between prediction and inference. The course is organized around the concept of operating characteristics, which is the currency by which models (both prediction and inferential) are often evaluated. We introduce and discuss concepts of out-of-sample predictive accuracy, their estimators via k-fold cross-validation and bootstrapping, and optimism concepts for the prediction setting. For the inference setting we will focus on bias, MSE, testing and interval coverage concepts of estimators that retain meaning in complex models. We demonstrate a general-purpose approach to calculating operating characteristics regardless of the specific family of models e.g., (regularized) regression models, support vector machines, gradient booted models, random forests, and neural networks. Mathematical details will be skipped favor of applied examples using R.

About the Instructor

Dr. Thomas G. Stewart is Assistant professor of Biostatistics and core faculty at the Data Science Institute at Vanderbilt. He developed a brand new computational and re-sampling curriculum for teaching statistics to Data Scientists. He has extensive expertise in prediction models, especially support vector machines and missing data, and regularization models.

Dr. Blume is Vice-Chair for Education in the Department of Biostatistics and Director of Graduate Education at the Data Science Institute. He founded the Data Science Master’s program. His lab focuses on machine learning and prediction, and the role of principles of inference in large-scale settings.

Dr. Mathew Shotwell is Associate Professor of Biostatistics at Vanderbilt University and core faculty at the Data Science Institute. He has taught statistical learning in the Biostatistics graduate program for over 5 years and is currently teaching machine learning in the Data Science program.

Megan Hollister is a PhD student in Dr. Blume’s. She is developing methods for attribution of predictive accuracy to features in complex models. She is also developing an R-package for broad computation of false discovery rates.

Relevance to Conference Goals

The goal of this course is to familiarize practitioners with essential concepts for evaluating prediction models and relate those to critical concepts for inferential tasks. At the conclusion of the course attendees should be able to
(1) Understand the different objectives of prediction and inference models / tasks.
(2) Identify the operating characteristics of primary importance for prediction, similarly for inference
(3) Simulate operating characteristics for simple prediction and inference models.
(4) Recognize the pitfalls of variable selection techniques when constructing models for inference
(5) Distinguish between in-sample and out-of-sample performance and understand the related concept of optimism
These are critical skills for applied statisticians and will help practitioners better interface with machine learners out in the wild.

Wed, Feb 17
2:00 PM - 5:30 PM
Virtual

SC7 - How to Lead Through Change and Build High-Performing Teams
Short Course (half day)

Instructor(s): Angela Demaree, PAWS Consulting, LLC

One day workshop introducing participants to high performance habits, tools, and techniques with a focus on leading through change and building high performing teams.

Participants will leave with simple yet effective tools that can easily be implemented in their personal and professional lives.

Outline & Objectives

Goal: To introduce participants to high performance principles and techniques to reach heightened levels of clarity, energy, courage, productivity, and influence at work and in life with a focus on leading through change and building high performing teams in a professional setting.

Outline:

Understanding High Performance, Boundaries, Values and Burnout.

Clarity: Find Your ‘Why’.

Energy: Tools for maintaining high levels of energy throughout your busy day.

Courage: Overcoming Fear and Overwhelm to lead through change.

Productivity: How do I do it ALL and Lead a Team?

Influence is not a four-letter word. (How to build high performing teams)

Staying positive, keeping an optimistic outlook, leading through change.

Engaging Employees Through Purpose

High Performance Recap

About the Instructor

Dr. Angela Demaree, a veterinarian, and veteran currently serves as the CEO and Principal Consultant for PAWS Consulting, a public health, and political consulting firm. Angela recently retired as a Major in the U.S. Army Reserves and deployed in 2012 in support of Operation Enduring Freedom where she learned strategic planning tools and techniques.

As the Equine Medical Director of the Indiana Horse Racing Commission, she successfully led and managed forty part-time intermittent employees through institutional change. She has her Master of Public Health in Biostatistics and Epidemiology from the University of Southern California’s Keck School of Medicine and is a Certified High Performance Coach.

Angela is a member of the American Statistical Association and currently serves on the Purdue University College of Veterinary Medicine's Alumni Board, the Indiana Animal Health Foundation Board, the Indiana Veterinary Medical Association’s Innovation Task Force, and the Legislative Working Group.

She spends her free time with her horse, Tommy and teaching her Quaker Parrot the Purdue Fight Song. You can connect with Angela on Twitter and LinkedIn @DemareeDVM

Relevance to Conference Goals

Participants will learn key leadership skills they can take back to their organization, build high performing teams, and how to lead through change and uncertainty.

Wed, Feb 17
2:00 PM - 5:30 PM
Virtual

SC8 - Bootstrap Methods and Permutation Tests
Short Course (half day)

Instructor(s): Tim Hesterberg, Google

We begin with a graphical approach to bootstrapping and permutation testing, illuminating basic statistical concepts of standard errors, confidence intervals, p-values and significance tests.

We consider a variety of statistics (mean, trimmed mean, regression, etc.), and a number of sampling situations (one-sample, two-sample, stratified, finite-population), stressing the common techniques that apply in these situations. We'll look at applications from a variety of fields, including telecommunications, finance, and biopharm.

These methods let us do confidence intervals and hypothesis tests when formulas are not available. This lets us do better statistics, e.g. use robust methods (we can use a median or trimmed mean instead of a mean, for example). They can help clients understand statistical variability. And some of the methods are more accurate than standard methods.

Outline & Objectives

Introduction to Bootstrapping
General procedure
Why does bootstrapping work?
Sampling distribution and bootstrap distribution

Bootstrap Distributions and Standard Errors
Distribution of the sample mean
Bootstrap distributions of other statistics
Simple confidence intervals
Two-sample applications

How Accurate Is a Bootstrap Distribution?

Bootstrap Confidence Intervals
Bootstrap percentiles as a check for standard intervals
More accurate bootstrap confidence intervals

Significance Testing Using Permutation Tests
Two-sample applications
Other settings

Wider variety of statistics
Variety of applications
Examples where things go wrong, and what to look for

Wider variety of sampling methods
Stratified sampling, hierarchical sampling
Finite population
Regression
Time series

Participants will learn how to use resampling methods:
* to compute standard errors,
* to check the accuracy of the usual Gaussian-based methods,
* to compute both quick and more accurate confidence intervals,
* for a variety of statistics and
* for a variety of sampling methods, and
* to perform significance tests in some settings.

About the Instructor

Dr. Tim Hesterberg is a Senior Data Scientist at Google. He previously worked at Insightful (S-PLUS), Franklin & Marshall College, and Pacific Gas & Electric Co. He received his Ph.D. in Statistics from Stanford University, under Brad Efron.

Hesterberg wrote "What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum", The American Statistician (2015) (really, that is for every statistician), co-authored Chihara and Hesterberg "Mathematical Statistics with Resampling and R" 2e (Wiley, 2018), and wrote the "Resample" package for R and was primary author of the "S+Resample" package for bootstrapping, permutation tests, jackknife, and other resampling procedures.

Hesterberg is on the executive boards of the National Institute of Statistical Sciences and the Interface Foundation of North America (Interface between Computing Science and Statistics).

He teaches kids to make water bottle rockets, and actively fights climate chaos.
Home page at http://www.timhesterberg.net/bootstrap, and humorous bio is at https://research.google/people/TimHesterberg.

Relevance to Conference Goals

Resampling methods are important in statistical practice, but are omitted or poorly covered in many old-style statistics courses. These methods are an important part of the toolbox of any practicing statistician.

It is important when using these methods to have some understanding of the ideas behind these methods, to understand when they should or should not be used.

They are not a panacea. People tend to think of bootstrapping in small samples, when they don't trust the central limit theorem. However, the common combinations of nonparametric bootstrap and percentile intervals is actually accurate than t procedures. We discuss why, remedies, and better procedures that are only slightly more complicated.

These tools also show how poor common rules of thumb are -- in particular, n >= 30 is woefully inadequate for judging whether t procedures should be OK.

Wed, Feb 17
2:00 PM - 5:30 PM
Virtual

SC9 - Mixed Models: A Critical Tool for Dependent Observations
Short Course (half day)

Instructor(s): Elizabeth Claassen, SAS / JMP; Ruth M Hummel, SAS Institute / JMP Division

The use of fixed and random effects have a rich history. They often go by other names, including blocking models, variance component models, nested and split-plot designs, hierarchical linear models, multilevel models, empirical Bayes, repeated measures, covariance structure models, and random coefficient models. Mixed models are one of the most powerful and practical ways to analyze experimental data, and investing time to become skilled with them is well worth the effort. Many, if not most, real-life data sets do not satisfy the standard statistical assumption of independent observations. Failure to appropriately model design structure can easily result in biased inferences. With an appropriate mixed model we can estimate primary effects of interest as well as compare sources of variability using common forms of dependence among sets of observations. Mixed Models can readily become the most handy method in your analytical toolbox and provide a foundational framework for understanding statistical modeling in general.

In this course we will cover many types of mixed models, including blocking, random coefficients, MLM, repeated measures, spatial models, GLMMs and NLMMs.

Outline & Objectives

This course presents methodology and applications of mixed models. Material is at an applied level, accessible to those familiar with basic ANOVA and regression. We will cover:
1. Why use Mixed Models?
2. ANOVA with a Single Blocking Effect
3. Models with Factorial Treatment Designs
4. Multiple Random Effects
5. Regression, Random Coefficients, and Multilevel Models
6. Repeated Measures and Longitudinal Data
7. Spatial Models
8. Simulation and Power Analysis
9. Generalized Linear and Nonlinear Mixed Models
10. A Modern Take on Mixed Models

About the Instructor

Dr. Elizabeth A. Claassen is Senior Associate Research Statistician Developer in the JMP division of SAS. Dr. Claassen has 9 years’ experience with SAS software and 5 years’ experience with JMP. Her chief interest is generalized linear mixed models, and she brings to this work her expertise with SAS GLM, MIXED, GLIMMIX, and NLMIXED procedures for linear models. Dr. Claassen earned an MS and PhD in statistics from the University of Nebraska–Lincoln, where she received the Holling Family Award for Teaching Excellence from the College of Agricultural Sciences and Natural Resources. She is an author of the third edition of "SAS® for Mixed Models: An Introduction and Basic Applications" (2018).

Dr. Ruth Hummel is an Academic Ambassador with JMP (a division of SAS), supporting the technical needs of professors and instructors who use JMP for teaching and research. Dr. Hummel is an author of "Business Statistics and Analytics in Practice, 9th edition" (2018), and has been teaching and consulting about statistics and analytics for over a decade, at the University of Florida, at the US EPA, and now at SAS/JMP. She has a PhD in Statistics from the Pennsylvania State University.

Relevance to Conference Goals

Our proposed workshop is directly relevant to Theme 3: Implementation and Analysis. We intend to provide participants with the framework to see why mixed models are needed, the tools to correctly adopt this methodology in practice, and experience with comparing incorrectly-built models with appropriately-specified models to understand the impact of correctly applying mixed model methodology in practice.
We will be discussing applications related to:
• Modeling
• Inferential and hypothesis testing
• New packages or procedures
• Implementing reproducible methods
• Evidence-guided statistical practice

Thursday, February 18

Thu, Feb 18
10:00 AM - 11:00 AM
Virtual

GS1 - Keynote Address
General Session

Practical and Adaptive Modeling to Inform Policy for the SARS-Cov-2 / COVID-19 Pandemic Response
R. David Parker, University of Alaska

Thu, Feb 18
11:00 AM - 12:30 PM
Virtual

CS01 - Building Effective Structure to Create High-Quality Output
Concurrent Session

Chair(s): Stephen Elston, Quantia Analytics, LLC

R Package Building in a Sustainable and Community-Driven Design to Promote Organization Infrastructure and Reduce Production Risk
Ben Barnard, Wells Fargo

Tips for the Data Scientist: Top 5 Reasons Why Your Data Science Project Didn’t Make It and How to Get It Right the First Time
Irina Kukuyeva, Ph.D., Kukuyeva Consulting

Thu, Feb 18
11:00 AM - 12:30 PM
Virtual

CS02 - Quick, But Can We Get It Right?
Concurrent Session

Chair(s): Charles H. Recchia, MACOM

Solving Real-World Problems, or Making Them Worse? The Collection and Analysis of Race Data
Lauren Ruth Samuels, Vanderbilt University School of Medicine

A Proposed Methodology for Executing a Survey Under an Aggressive Timeline (When You Know Nothing)
Darius Singpurwalla, National Center for Science and Engineering Statistics

Thu, Feb 18
11:00 AM - 12:30 PM
Virtual

CS03 - Time Series Methods
Concurrent Session

Chair(s): Jill Schnall, University of Pennsylvania

State Space Model Using Kalman Filter to Forecast Mortality Time Series Data in Health Science
Jae J Lee, State University of New York, New Paltz

Identifying and Describing Change Points in Time Series Using Cluster Analysis
David J. Corliss, Peace-Work

Thu, Feb 18
11:00 AM - 12:30 PM
Virtual

CS04 - Minding Our 'P's
Concurrent Session

Chair(s): Christine P. Chai, Microsoft

Moving to a World Beyond P<0.05
Ronald L Wasserstein, American Statistical Assocation

Thoughts on Communicating Statistics in a Post p<0.05 World: Know Your Audience(s)
Nicole Lazar, Pennsylvania State University

Thu, Feb 18
12:30 PM - 1:30 PM
Virtual

PS1 - ePoster Session 1
Poster Session

Challenges Associated with Using Aircraft Flight Density as an Estimator for the Probability of an Aircraft Impact into a Hazardous Facility
William C. Walker, Oak Ridge National Laboratory

Counting Cases, or Protecting the Workforce? Establishing Freedom-from-Disease in Occupational Settings
Annette M Bachand, Ramboll

Visualizing Dichotomous Data Correlations Using Two-Sample Corrgrams
Rohan Reddy Tummala, University of Tennessee Health Science Center

Modeling and Comparing the COVID-19 Infection Rate for Taiwan and the United States of America Using Phase-Specific Nonparametric Density Regression Technique
Mason Chen, Stanford OHS

Clustering Analyses Based on Gait Qualities of Community-Dwelling Older Adults
Anisha Suri, Swanson School of Engineering, University of Pittsburgh

Post-Stratification in Contexts Where Strata Population Counts Are Unavailable
Anja Zgodic, University of South Carolina

Statistical and Trial Design Considerations for an 1115 Medicaid Waiver in Kentucky
Elizabeth F Bair, University of Pennsylvania

Modeling and Forecasting of the COVID-19 Outbreak in Various Indian States
Abhishek Bhattacharjee, Dept. of Bioengineering, University of Illinois at Urbana-Champaign

Thu, Feb 18
1:30 PM - 3:00 PM
Virtual

CS05 - Career Paths: Student to Professional to Leader
Concurrent Session

Chair(s): David J. Corliss, Peace-Work

From Integrals and Lemmas to Influence and Leadership
Bud Sanders, Strategic Oversight Services, Inc.

Learning in Transition to the Workplace: Perspectives from the Statistical Community
Layla Guyot, University of Texas at Austin

Thu, Feb 18
1:30 PM - 3:00 PM
Virtual

CS06 - AI to the Rescue, or Rescuing AI?
Concurrent Session

Chair(s): Terrie Vasilopoulos, University of Florida

Assessing the Quality of an Audit Sample and Estimate
Zachary Rhyne, LLC

A Responsible AI Blueprint and an Executable Roadmap for the Intelligence Community
Theresa Walker, Accenture Federal Services

Thu, Feb 18
1:30 PM - 3:00 PM
Virtual

CS07 - Health Care Applications
Concurrent Session

Chair(s): Emily Robinson, University of Nebraska–Lincoln

At What Age Should Cancer Screening Be Started?
Dongfeng Wu, University of Louisville

Thu, Feb 18
1:30 PM - 3:00 PM
Virtual

CS08 - Showcasing Your Stats
Concurrent Session

Chair(s): Ralph (Mac) Turner, TBD

Building Your Audience: Crafting a Voice to Communicate Statistics to the General Public
Coleman Reed Harris, Vanderbilt University

Getting Started with Animations in R
Jonathan Kane Storey, Mississippi State University Institute for Systems Engineering Research

Thu, Feb 18
3:00 PM - 4:00 PM
Virtual

PS2 - ePoster Session 2
Poster Session

Detecting Fake Images via Multiscale Methods in High-Dimensional Data
Minsu Park, Samsung Medical Center

Data Visualization Using the Human Body as a Canvas for Data
Jennifer McGinniss, Regeneron Pharmaceuticals

Reproducible Performance Report Generation to Provide an Approach for Communicating Quality Metrics Longitudinally to Wisconsin Health Care Providers
Nicholas Marka, University of Wisconsin - Department of Surgery

Assessing the Contagiousness of Mass Shootings with Nonparametric Hawkes Processes
Peter Carson Boyd, Oregon State University

An Application of Structural Equation Modeling to the Analysis of Ordered Categorical Factors Indicating Mother-Father-Child Interactions Impacting Cognitive Development of US Children at Age Five
Aleksandra Kazakova, The Graduate Center, CUNY

Calibration of Alzheimer’s Disease Microsimulation with Approximate Bayesian Computation
Peter Tadashi Shewmaker, Brown School of Public Health

Modeling and Optimizing Job Retention: A Case Study
Ryan Christianson, Virginia Tech

Effect of Simultaneous Error Rates on Reproducibility
Scott Richter, University of North Carolina at Greensboro

Thu, Feb 18
4:00 PM - 5:30 PM
Virtual

CS10 - Cats and Boots in Production
Concurrent Session

Chair(s): Erya Huang, TBD

Sampling the Unknown: A Multinomial Approach
John Stephen Taylor, STATISTICODE, LLC

Parametric Bootstrap for Design of Experiment for Early-Life Reliability Screening of Electronics
Charles H. Recchia, MACOM

Thu, Feb 18
4:00 PM - 5:30 PM
Virtual

CS11 - Anomaly Detection
Concurrent Session

Chair(s): Emiliana Patlan, USAA

Scaling New Peaks: A Viewership-Centric Approach to Content Curation
Subhabrata Majumdar, AT&T Labs Research

WITHDRAWN: Detection and Mitigation of Anomalous Traffic Spikes in Communication Networks
Mrinmoy Bhattacharjee, Nokia

Thu, Feb 18
4:00 PM - 5:30 PM
Virtual

CS12 - Tips for Collaborative Grant Writing (Panel)
Concurrent Session

Chair(s): Layla Guyot, University of Texas at Austin

Tips for Collaborative Grant Writing
Jody D. Ciolino, Northwestern University, Feinberg School of Medicine; Masha Kocherginsky, Northwestern University, Feinberg School of Medicine; Mary J Kwasny, Northwestern University; Leah J Welty, Northwestern University

Thu, Feb 18
4:00 PM - 6:00 PM
Virtual

CS09 - Challenges in Cross-Disciplinary Collaboration
Concurrent Session

Chair(s): Xueliang Pan, The Ohio State University

Tales from the Trenches in Academic/Industry Predictive Modeling Partnerships
Jennifer H Van Mullekom, Virginia Tech

How to Work with Every Client--Or Not
Elaine Eisenbeisz, Omega Statistics; Karen Grace-Martin, The Analysis Factor; Harry Dean Johnson, Washington State University; Clark Kogan, Washington State University; Kim Love, K. R. Love QCC; Nayak Polissar, The Mountain-Whisper-Light Statistics; Stephen Simon, P.Mean Consulting

Thu, Feb 18
5:30 PM - 7:30 PM
Virtual

Trivia Night
Other

Friday, February 19

Fri, Feb 19
9:00 AM - 11:00 AM
Virtual

PCD1 - Dashboards: Conveying Your Modeling Outcomes to Enhance Audience Engagement
Practical Computing Demo

Instructor(s): Clair Alston-Knox, Predictive Analytics Group; Theo Gazos, Predictive Analytics Group

Modern technology has led to massive increases in information (data) available to businesses and government agencies. Along with this increase in available data, management, employees , researchers and the general public need to be presented with the salient information it provides in a form that is suitable for them to clearly and quickly see the message of any underlying analysis or summary.

Dashboards, available using web-browsers or mobile technology have emerged as an effective medium in which to convey information using appropriate snapshots and trends, and can be tailored for different audiences.

In this tutorial, we will use several case studies to provide a basis for participants to think about how they may effectively construct dashboards for their own audience, with advice on the types of graphs and summaries that can be quickly understood, typical detail that different users may require, layout for web vs mobile technology and the use of group and global filtering. Automation for periodic updating will be achieved using a intuitive GUI interface pipeline, illustrating the ease of updating dashboards (and reports) which previously was manual and often time consuming.

Outline & Objectives

This tutorial is aimed at data scientists and statisticians who need to convey information to audiences beyond technical reports and scientific papers. Along with showing techniques to make dashboards interesting and attractive, we will introduce pipelines to automate the updating of the dashboard as new data becomes available, and guard against unexpected employee attrition and staff changes. No prior experience with Dashboards is required.

The tutorial will be case study based, and several dashboards will be constructed for different purposes. For example, a management dashboard will be constructed, for both webpage and mobile. We will use visualisations such as geo-charts and other standard charts, then filters will be applied to allow users easy access to the information they require.

Dashboards will be constructed with basic summary statistics and extended using more sophisticated models, conveying predictions and trends for policy decision purposes, planning and general interest. In addition, we will use other techniques, such as statistical processs control to produce real time monitoring dashboards that may be beneficial in industry and healthcare services.

About the Instructor

Dr Theo Gazos is the Managing Director of Predictive Analytics Group. Theo has over
25 years of experience building economic and econometric models that isolate and quantify the impact
of changing market dynamics (domestic and international), competition effects and government policy on private and government sector organisations. Theo is passionate about bringing the power of statistics and machine learning to all levels within organisations, and has used his years of experience to develop an interface and user flow within AutoStat® that makes this objective achievable.

Dr Clair Alston-Knox is a Senior Statistician with Predictive Analytics Group (Melbourne
Australia). She had been an research and academic statistician since 1992, with a number of biometric
and statistical consulting positions in government and universities. She joined Predictive Analytics and
the AutoStat Institute in 2018 because her teaching, consulting, advising and ethics committee roles were frequently frustrated by researchers who were very capable of understanding the objective and benefits of statistical or machine learning approaches, but did not have the resources to learn the required platform to enable next level analysis.

Relevance to Conference Goals

Dashboards are a powerful tool for enabling statisticians and data scientists to better communicate and collaborate with colleagues and clients. Constructing a useful dashboard requires skill as an applied statistician, and the collaboration with clients is usually very natural in this setting. Clients and colleagues are able to engage with the messages being displayed in the dashboard, and feedback tends to become very natural. This feedback loop is helpful in developing skills in both data story telling, as the statistician becomes aware of how lay-people interpret visual displays, and it serves to develop potentially lifelong collaborative partnerships through the development of better understandings of how various members of the team contribute to the outcome. The use of dashboards can have a positive effect on the organisation by conveying clear messages to employees in different areas of the operation, and allowing them to see company snapshots and trends in a clear format. This increased understanding allows employees to contribute to dialogue and planning based on a solid understanding of the organisations current position, making statistical contributions very active.

Fri, Feb 19
9:00 AM - 11:00 AM
Virtual

PCD2 - JMP Statistical Discovery Software from SAS
Practical Computing Demo

Instructor(s): Ruth M Hummel, SAS Institute / JMP Division; Kevin Potcner, SAS Institute / JMP Division

Imagine this: You are a statistical consultant and your new client brings their collected data to you, explains their goals, and asks, “How do I analyze this?” You examine the spreadsheet of data and ask a few questions, only to realize that, although they collected a lot of data, each of the treatments was only applied to one large block, and there isn’t any replication to test the treatment effect! The whole experiment needs to be redone, and the resources that went into this first attempt were wasted.

This is why thoughtful design of the experiment is so critical – it can save you much time, money, and tears!

In this session we will briefly discuss WHY designed experiments are so important, and then we will cover HOW, in JMP, to quickly and easily design your experiment and generate a data table with appropriate order randomizations and with prepopulated model scripts to make analysis a one-click process once you’ve collected the data. We will also cover a few types of common designs, how to branch out into custom designs, and how to compare possible designs to pick the best one for your goals.

Outline & Objectives

Objectives:
Attendees should learn more about:
• How easy designing an experiment can be, even for complex scenarios.
• Common types of experimental designs and how to create them, and the flexibility of a custom optimized design and how to create one.
• How to compare and judge candidate designs, how to explore Power and Sample Size calculations.
• How to match the analysis to the experimental design.

Outline:
Introduction to Design Of Experiments
• What is DOE?
• Conducting Ad Hoc and One-Factor-at-a-Time (OFAT) Experiments
• Why Use DOE?
• Types of Experimental Designs

Factorial Experiments
• Designing Factorial Experiments
• Analyzing Full Factorial

Custom Designs
• Options for Factors
• Quick Overview of Optimality

Case Study
• Defining the Problem and the Objectives
• Identifying the Responses
• Identifying the Factors and Factor Levels
• Identifying Restrictions and Constraints
• Preparing to Conduct the Experiment
• Analysis

About the Instructor

Kevin Potcner is an Academic Ambassador with JMP (a division of SAS), working with professors and researchers to use JMP. Kevin also teaches predictive analytics and data mining in the MBA program at University of San Francisco, and he serves on the Data Science Advisory Board Member at California State University, Fullerton. He has an MS in Statistics from the University of Florida.
Ruth Hummel is also an Academic Ambassador with JMP, supporting the technical needs of professors and instructors who use JMP for teaching and research. Ruth is an author of Business Statistics and Analytics in Practice, 9th edition, and has been teaching and consulting about statistics and analytics for over a decade, at the University of Florida, at the US Environmental Protection Agency, and now at SAS/JMP. She has a PhD in Statistics from The Pennsylvania State University.

Relevance to Conference Goals

This session directly addresses “Theme 2: Study Design and Data Management” and “Theme 3: Implementation and Analysis” by covering the topics of how to design a study, how to compare potential study designs, how to investigate power and sample size concerns, and how to match the analysis of the data to the original experimental design.

Fri, Feb 19
9:00 AM - 11:00 AM
Virtual

PCD3 - Causal Inference Using Stata: Estimating Treatment Effects with Observational Data
Practical Computing Demo

Instructor(s): Chuck Huber, StataCorp LLC

Modified: Observational data often come with challenges that the data analyst needs to address. Treatment status or the exposure of interest may not be assigned randomly. Data are sometimes missing not at random (MNAR), which can lead to sample-selection bias. And statistical models for these data often need to account for unobserved confounding.

Join Chuck Huber, Director of Statistical Outreach, as he shows you how you can use standard maximum-likelihood estimation to fit extended regression models (ERMs) that deal with all of these common issues. He will work examples that demonstrate how to account for these observational data problems when they arise individually and when they occur simultaneously.

Outline & Objectives

1. Overview of the potential-outcomes framework for causal inference
o Stable unit treatment value assumption
o Potential-outcome means
o Average treatment effects
o Average treatment effects on the treated
2. Estimating treatment effects
o Using the regress and margins commands
o Using the teffects commands
- Regression adjustment
- Inverse probability weighting
- Propensity score matching
- Covariate distance matching
3. Estimating treatment effects with complications
o Estimating treatment effects while accounting for unobserved confounders
o Estimating treatment effects with sample selection (data missing not at random)

About the Instructor

Joerg Luedicke is a Senior Social Scientist and Statistician at StataCorp LLC. Joerg was the lead developer of Stata's latestdiscrete choice model commands and helped in the development of Stata's teffects suite of commands. Prior to joining StataCorp, Joerg earned a PhD in Sociology from Bielefeld University, Germany.

Relevance to Conference Goals

This proposed practical computing demonstration is primarily relevant in regards to the "Implementation and Analysis" theme of the conference. We will present state-of-the-art statistical methods for estimating treatment effects with observational data. It is also relevant to the "Study Design and Data Management" theme because it is important for researchers to have a good understanding of causal inference methods and treatment effects estimation in the design stage of a study. Because the proposed demonstration shows a number of hands-on examples that include discussion on how to interpret and communicate the results, it also touches on the "Effective Communication" theme of the conference.

Fri, Feb 19
9:00 AM - 11:00 AM
Virtual

PCD4 - WesDaX®: An Online Analysis and Reporting Platform
Practical Computing Demo

Instructor(s): Tom Krenzke, Westat; Naomi Yount, Westat.

WesDaX® is an online analysis and reporting platform created by Westat (www.wesdax.com) that can run from any standard web browser, requires no code writing, and no experience with analysis software. WesDaX supplements project reporting for research projects, and allows staff, clients, collaborators, and stakeholders to run analyses from microdata. The demonstration will begin with some background on WesDaX, what WesDaX can do, and what is unique about WesDaX. A tour through a public data suite will be given, demonstrating analyses of American Community Survey data and Behavioral Risk Factor Surveillance System data. The main objective of the presentation is to provide awareness of the tool, which can be beneficial to the audience and hit on several conference themes, such as educating others about data from surveys, evidence-guided statistical practices, and reproducible evidence. WesDaX analysis results are powered by WesVar (the analytic engine that computes the estimates) and are generated appropriately from complex sample data, with statistical testing. There is an option for advanced confidentiality protection, which protects against table differencing attacks.

Outline & Objectives

The objective of the course is to empower the user to be able to take an in-depth first look at data from sample surveys, while generate statistics that handle complex survey data in variance estimation and statistical testing. An outline of the course is as follows:
1. Introduction to WesDaX®
a. Video
b. Key features
c. Architecture – WesDaX interface, WesVar analytic engine
2. Statistical methods
a. Point estimation
b. Variance estimation
c. Statistical testing
d. Disclosure avoidance
3. Landing page
a. White paper
b. Guide
c. Users
d. Public use suite
e. Demo – BRFSS
4. Exercises
5. Summary
a. Educating others about data from surveys
b. Evidence-guided statistical practices
c. Reproducible evidence
d. How to get started

About the Instructor

Tom Krenzke is a Vice President and Associate Director in Westat’s Statistics and Evaluation Sciences Unit, and has about 30 years of experience in survey sampling and estimation techniques. Mr. Krenzke adds new statistical capabilities by developing software for statistical disclosure control; nonresponse bias analysis; area sampling, and imputation. Mr. Krenzke is a Fellow of the American Statistical Association (ASA) and leads Westat’s Steering Committee on WesDaX (Westat’s real-time online table generator).

Naomi Yount, Ph.D., is an industrial /organizational psychologist and Westat Senior Study Director with more than 15 years of experience in organizational research. She has expertise in a variety of research methodologies, from qualitative interviewing to quantitative analyses of survey and other organizational data. At Westat, Dr. Yount conducts analyses such as psychometric analyses for new or revised surveys and key driver analyses predicting outcomes such as turnover or employee engagement.

Relevance to Conference Goals

WesDaX incorporates best practices in generating tabular statistics and conducting statistical tests from complex survey data without any programming code. As part of a data management toolkit, WesDaX provides an efficient way to disseminate aggregated data to the public. This tool can help statisticians and project managers use data to improve their ability to communicate with and aid customers and organizations, and have a positive impact on your organization. Furthermore, the course will demonstrate benefits related to data preparation for tabulations and will provide illustrative data analysis examples, which focus on a variety of data types from varied applied settings that support evidence-guided statistical practice.

Fri, Feb 19
9:00 AM - 11:00 AM
Virtual

T1 - Regression-Style Modeling with Variable Selection and Reduction
Tutorial

Instructor(s): Clay Barker, SAS / JMP; Ruth M Hummel, SAS Institute / JMP Division

Variable Selection is a crucial step in the model building process, whether we are building a predictive model or trying to understand the results of a designed experiment. Generalized Regression modeling provides a single framework for doing interactive variable selection and fitting generalized linear models. This workshop will start with a brief overview of the generalized linear model for modeling responses that are not necessarily normally distributed. We will also introduce variable selection techniques, including stepwise methods like Forward Selection and penalized regression methods like the Lasso. We close the workshop with examples featuring both observational and experimental data and a variety of response types.

Outline & Objectives

Outline
1. Brief Overview of Generalized Linear Models
2. Intro to Stepwise Variable Selection Methods
3. Intro to Penalized Regression Methods
4. Examples with nonnormal distributions, censoring, multicollinearity, etc.

(a) Performance objectives
By attending this presentation, participants will improve their knowledge of generalized linear models and variable selection techniques. They will also feel comfortable using these methods in software.
(b) Content and instructional methods
The presentation will alternate between the use of slides and software demonstrations. Handouts given to attendees will cover both.

About the Instructor

Dr. Clay Barker is a Senior Research Statistician Developer with JMP (a division of SAS) on a variety of statistical platforms in JMP, including Generalized Regression, Fit Curve and Clustering. He earned his doctorate in statistics from North Carolina State University. He holds several patents, including one for his work on implementing new visualizations for interactive model building in generalized regression.

Dr. Ruth Hummel is an Academic Ambassador with JMP (a division of SAS), supporting the technical needs of professors and instructors who use JMP for teaching and research. Dr. Hummel is a coauthor of Business Statistics and Analytics in Practice, 9th edition (2018), and has been teaching and consulting about statistics and analytics for over a decade, at the University of Florida, at the US Environmental Protection Agency, and now at SAS/JMP. She has a PhD in Statistics from The Pennsylvania State University.

Relevance to Conference Goals

Career Development - Building regression models is a crucial part of data analysis. Sharpening these skills in modern software can be helpful for statisticians in every stage of their career. Performing variable selection in an interactive environment makes it quick and easy to communicate results and assess tradeoffs of different models.

Implementation and Analysis - We will be discussing applications related to:
• Modeling
• Inferential and hypothesis testing
• Predictive analytics
• New packages or procedures
• Analytics, big data, and unstructured data analytic methods
• Machine learning
• Implementing reproducible methods
• Evidence-guided statistical practice

Fri, Feb 19
9:00 AM - 11:00 AM
Virtual

T2 - Bayesian Analytics in Practice
Tutorial

Instructor(s): Sujit Kumar Ghosh, North Carolina State University; Amy Shi, SAS Institute, Inc.

The Bayesian paradigm provides a natural and practical way for building analytical models by expressing complicated models through a sequence of simple conditional models making them useful for simple to complex data structures. This tutorial will begin with a few simple introduction to Bayesian hierarchical models and then expand on more realistic and complex models that have recently emerged within Machine Learning literature. All of these models will be illustrated through practical applications and worked-out examples without getting into the theoretical underpinnings. Participants with basic knowledge of probability theory and statistical inferential framework would find the tutorial useful in expanding their standard toolkit to advanced use of Bayesian analytical methods. The concepts and methods discussed are demonstrated using the various software (R and SAS) developed by the presenters, but they are applicable to any modern Bayesian software package.

Outline & Objectives

Part I - Introduction to Bayesian Hierarchical Models
1. Basic components of Priors, Likelihood and Posterior;
2. Predictive Distributions;
3. Computational Methods using Monte Carlo Simulations

Part II - Primer on R and SAS
1. JAGS through R
2. SAS through PROC MCMC and PROC BGLIMM

Part III – Hierarchical Models in Practice
1. Linear and generalized linear models;
2. Multi-level models;
3. Penalized regression models with missing data

This tutorial aims to familiarize attendees with the essential concepts and computational methods of the Bayesian analytics in the areas where hierarchical modeling is conducted. They will learn how to deal with practical issues that arise from Bayesian analysis, especially those in multilevel modeling. Another major goal is to help attendees become comfortable with using software to conduct Bayesian inference using machine learning models.

About the Instructor

Professor Sujit Kumar Ghosh is currently a Full Professor in the Department of Statistics at North Carolina State University (NCSU). He has over 25 years of experience in conducting, applying, evaluating and documenting statistical analysis of biomedical and environmental data. Prof. Ghosh is actively involved in teaching, supervising and mentoring graduate students at the
doctoral and master levels. He has supervised over 35 doctoral graduate students and recently published a popular book titled "Bayesian Statistical Methods" co-authored with Brian Reich which is being used as a textbook at several universities. He is an elected fellow of the ASA and has also served as the Deputy Director at SAMSI (NC).

Amy Shi is a senior research statistician developer in the Advanced Statistical Methods Department at SAS Institute Inc. Her main responsibility is developing and enhancing the Bayesian capabilities of SAS software, with a focus on generalized linear mixed models, discrete choice models, and multilevel hierarchical settings. She is the developer of the BGLIMM procedure. She has a PhD in biostatistics from the University of North Carolina at Chapel Hill.

Relevance to Conference Goals

Bayesian methods are becoming ever more popular in many applied fields. We have designed the tutorial from a practical point of view, covering a wide range of commonly encountered hierarchical analytical models applied to several different data structures encountered in practice (simple rectangular data structure to complex data structure with missing and/or censored observations). The course emphasizes the practical aspect of Bayesian computational methods using some of industry standard software. The how-to part of the course is presented using a variety of worked-out examples from different applied settings, with code explained in detail.

Fri, Feb 19
9:00 AM - 11:00 AM
Virtual

T3 - Tidyverse Tools in R for Data Science and Statistical Inference
Tutorial

Instructor(s): Chester Ismay, DataRobot; Jessica Minnier, Oregon Health & Science University

Many statisticians use R to clean, manage, and analyze data. Recently, a philosophy promoting tidy data (uniform in shape, one observation per row, one variable per column) and tidy code (readable, consistent across tasks) has risen in popularity. The “tidyverse” is a term for a collection of R packages that embrace this philosophy and are designed to work together to improve readability of code and reproducibility of workflows. The tidyverse is especially accessible to beginners as it allows students to dive into writing tidy code and quickly perform data wrangling, data reshaping, and data visualization tasks. It is also useful in professional data science and statistics as it provides a cohesive architecture for analyzing tidy data. This workshop will introduce tidyverse core and community packages for data processing and visualization (e.g. dplyr, janitor, ggplot2) and showcase new packages (moderndive, infer) designed to extend the tidyverse to a common framework for statistical inference. Using practical examples with large and complex R datasets (e.g. gapminder), we’ll show how to use tidyverse tools to create readable and accessible code for novices and experts alike.

Outline & Objectives

Participants will learn how to tackle the data analysis workflow from start (wrangling, tidying) to finish (visualization, analysis, inference) using tidyverse tools in R. Participants will be able to incorporate any of these tools into their own coding practices as each component can stand alone and also work seamlessly together as a whole. Topics include:

Data Wrangling using the dplyr package
Data Visualization using the ggplot2 package
Data Tidying using the tidyr and janitor packages
Resampling using the moderndive package with dplyr
Statistical inference using the infer package

Intended level: R novices to intermediate R users. Advanced R users may also find interest here as a simplified way to perform statistical inference from a simulation-based approach.

Prerequisites: Some experience with R is helpful. This workshop is interactive, so participants are encouraged to bring a laptop with the latest versions of R and RStudio installed. This workshop will only use free and open-source software.

About the Instructor

Chester leads data science, machine learning, and data engineering in-person workshops as a Data Science Evangelist for DataRobot University with DataRobot. He obtained a PhD in Statistics from Arizona State University and has taught courses and led workshops in mathematics, computer science, statistics, data science, and sociology. He is co-author of the fivethirtyeight, infer, and moderndive R packages and is author of the thesisdown R package. He is also a co-author of Statistical Inference via Data Science: A ModernDive into R and the Tidyverse, an open-source textbook for introductory statistics and data science using R.

Jessica Minnier is an Associate Professor of Biostatistics at OHSU who collaborates with researchers and clinicians in the medical and healthcare fields. She uses R for data cleaning, analysis, and visualization. She has experience teaching statistical methods and programming to graduate students in statistics, public health, biology and medicine, as well as to her statistician peers. She has helped develop R and statistics educational materials, including an interactive R Bootcamp, various recorded R workshops, and interactive R Shiny tutorials.

Relevance to Conference Goals

The tidyverse is designed to provide code that is easily readable, which greatly improves collaboration and communication. These methods can be implemented across many disciplines and can improve programming confidence for beginners. In addition, at the heart of many applied statistician roles is performing hypothesis tests and confidence intervals. The presented moderndive and infer packages extend the tidyverse philosophy to also allow statistical inference to follow a friendly, explicit syntax. This workshop will teach participants a new way to think about these inferential concepts using R packages designed to make their analyses simpler, more reproducible, and more easily modifiable for use in multiple projects. The concepts of bootstrapping and permutation tests, as shown computationally in the infer and moderndive packages, allow for professional development and growth in the field of statistical inference. Connections can also be made to traditional tests like the t-test and ANOVA through use of the infer package and other tidyverse packages. Participants can extend this approach to their own analyses and showcase it in their organization and in their own work.

Fri, Feb 19
9:00 AM - 11:00 AM
Virtual

T4 - Introduction to BlueSky Statistics
Tutorial

Instructor(s): Robert Anthony Muenchen, University of Tennessee

BlueSky Statistics is an open-source graphical user interface to the R language. It uses menus and dialog boxes to manage data, create plots, and analyze data. By default, the R code that it writes is hidden from the user, but can also be displayed and modified before execution.

For non-programmers, BlueSky helps them get their work done. For students of R, it helps to learn R code. For advanced users, it offers a way to convert code into dialogs that allow more effective interaction between programmers and non-programmers within an organization.

Outline & Objectives

1) Importing Data from Excel, SAS, SPSS, Stata, Web Survey tools, etc.
2) Managing Data
a) Computing new variables
b) Conditional transformations
c) Factor level management (uses forcats package)
d) Missing value imputation (uses simputation package)
e) Ranking
f) Recoding
g) Reshaping data (uses tidyr package)
h) Subsetting data
i) Splitting – test/train & group by processing
3) Graphics (uses ggplot2 package)
a) Grammar of Graphics concepts (faceting, etc)
b) Bar charts
c) Density
d) Frequency
e) Histograms
f) Line charts
g) Q-Q plots
h) Scatterplots
4) Basic Analysis
a) Summaries & frequencies
b) Crosstabulation
c) Group comparisons (parametric & non-parametric)
d) Tables – easily generate complex tables of statistics and statistical tests (uses the arsenal package)
5) Model Building
a) Linear regression
b) Logistic regression
c) ANOVA
6) Model Tuning
a) Overview of machine learning methods available
b) Overview of cross-validation methods available
7) Model Statistics – scoring, ROC curves, various metrics & diagnostics
8) Setting options – APA journal style, significant digits, etc.
9) Reproducibility & R Markdown
10) Syntax editor

About the Instructor

Robert A. Muenchen is the author of R for SAS and SPSS Users, and co-author of R for Stata Users and An Introduction to Biomedical Data Science. He is also the creator of r4stats.com, a popular web site devoted to analyzing trends in data science software, reviewing such software, and helping people learn the R language.

Bob is an ASA Accredited Professional Statistician™ who focuses on helping organizations migrate from SAS, SPSS, and Stata to the R Language. He has taught workshops on data science topics for more than 500 organizations and has presented workshops in partnership with the American Statistical Association, RStudio, DataCamp.com, and Revolution Analytics. Bob has written or co-authored over 70 articles published in scientific journals and conference proceedings and has provided guidance on more than 1,000 graduate theses and dissertations at the University of Tennessee.

Relevance to Conference Goals

Many organizations have statisticians and data scientists who are expert programmers. They also have staff members who need to analyze data, but who lack the time or inclination to become good programmers. Software like BlueSky Statistics can help these two types of analysts work together more effectively by using the same R software for their work. Programmers can extend BlueSky’s capabilities with menus and dialogs that control their R code. People wishing to learn R can do so by studying the code that BlueSky writes for them.

Fri, Feb 19
11:00 AM - 1:00 PM
Virtual

CS13 - Facing Organizational and Ethical Considerations Resulting from COVID-19
Concurrent Session

Chair(s): Yimei Li, University of Pennsylvania

Data Science During a Crisis: Serving Organizational Needs and High-Risk Patients in the COVID-19 Pandemic
David Shilane, Columbia University

Ethics Panel: Data and Analytic Issues in the Age of COVID-19
David J. Corliss, Peace-Work; R. David Parker, University of Alaska; Julia Sharp, Colorado State University; David Shilane, Columbia University; Suzanne Thornton, Swarthmore College

Fri, Feb 19
11:00 AM - 12:30 PM
Virtual

CS14 - Leveraging More Information
Concurrent Session

Chair(s): Jessica Thomson, USDA

The NCHS Research and Development Survey
Jennifer D Parker, NCHS

A Framework for Improving the Efficiency of Operational Testing Through Bayesian Adaptive Design
Victoria Rose Carrillo Sieck, University of New Mexico / Air Force Institute of Technology

Fri, Feb 19
11:00 AM - 12:30 PM
Virtual

CS15 - Bayesian Applications
Concurrent Session

Chair(s): Clark Kogan, Washington State University

Modeling Age-Adjusted Rates from Spatio-Temporal Data Sets with Excess Zero Counts
Melissa Jay, University of Iowa

Calibration of a Microsimulation Model Using Approximate Bayesian Computation
Peter Tadashi Shewmaker, Brown School of Public Health

Fri, Feb 19
11:00 AM - 12:30 PM
Virtual

CS16 - Building Communication Skills at All Levels
Concurrent Session

Chair(s): Coleman Reed Harris, Vanderbilt University

Developing Well-Rounded Statistical Collaboration Skills with Case-Based Learning
Mario Davidson, Vanderbilt University School of Medicine

Communication in Statistical Collaborations: Creating Shared Understanding
Eric Vance, LISA-University of Colorado Boulder

Fri, Feb 19
12:30 PM - 1:30 PM
Virtual

PS3 - ePoster Session 3
Poster Session

More Data More Problems: R Shiny to the Rescue in Interpreting a Large-Scale Transcriptome-Wide Association Study
Amanda Lucille Tapia, Department of Biostatistics, University of North Carolina - Chapel Hill

Effective Statistical Consulting: A Framework for Overcoming Communication Barriers in the Collaboration
Kevin Rion, Bridgewater State University

A Taxonomic Approach at Characterizing and Predicting Texas Coast Hurricanes
Samuel Greer, Texas State University

Using Effect Modification Analysis of Inflammatory Markers to Understand Potential COVID-19--Related Prognosis from Nationally Representative Data
Srikanta K Banerjee, Walden University College of Health Professions

Visualization Techniques to Help Determine Whether the Continuous Interaction Effect Assumption Is Met Within the Standard Interaction Model
Shane J Sacco, University of Connecticut

Spatio-Temporal Analysis of Particulate Matter Based on Quantile Factor Model
MINJI KIM, Seoul National University

STEAMS Methodology of Sports Science and Injury
Lily Sun, Stanford OHS

Advancing Advancement with Data Science
Christopher Grubb, Virginia Tech

Simplify Poker Game for Predicting the Winning Probability Based on Risk Management
Saloni Patel, Stanford Online High School

Fri, Feb 19
1:30 PM - 3:00 PM
Virtual

CS17 - Complex Data and Designs
Concurrent Session

Chair(s): Eric B. Stephens, Nashville General Hospital

Estimating Speech and Language Disorder from a Nationally Representative Sample Using SAS Survey Procedures vs. SAS-Callable SUDAAN
Sana Nasir Charania, United States Centers for Disease Control and Prevention

HDSI: High-Dimensional Selection with Interactions Algorithm on Feature Selection and Testing
Rahi Jain, Princess Margaret Cancer Centre

Fri, Feb 19
1:30 PM - 3:00 PM
Virtual

CS18 - Controlling Text and Texting Controls
Concurrent Session

Chair(s): Mary J Kwasny, Northwestern University

Guidelines in Selecting Appropriate Text Preprocessing Methods
Christine P. Chai, Microsoft

Evaluating Policy and Quantifying Uncertainty with Few (or One) Treated Unit(s): An Introduction to Synthetic Control Methods and Falsification Analyses
Sydney Kahmann, UCLA Statistics

Fri, Feb 19
1:30 PM - 3:00 PM
Virtual

CS19 - Modeling Topics
Concurrent Session

Chair(s): Jennifer H Van Mullekom, Virginia Tech

Model Evaluation Metrics
Jennifer Svrlinga, Internal Revenue Service

Structural Equation Modeling with Count Variables
Kevin Landis McKee, Virginia Tech

Fri, Feb 19
1:30 PM - 3:00 PM
Virtual

CS20 - Engaging in Difficult Conversations with Nonstatisticians (Panel)
Concurrent Session

Chair(s): Thomas G Stewart, Vanderbilt University School of Medicine

Engaging in Tough Conversations with Nonstatisticians
Emily Griffith, North Carolina State University; Megan Higgs, Critical Inference LLC; Julia Sharp, Colorado State University; Zach Weller, Colorado State University

Fri, Feb 19
3:00 PM - 4:30 PM
Virtual

GS2 - Closing Session
General Session

Chair(s): David J. Corliss, Peace-Work

The Closing Session is an opportunity for you to interact with the CSP Steering Committee in an open discussion about how the conference went and how it could be improved in future years. CSPSC vice chair, David Corliss, will lead a panel of committee members as they summarize their conference experience. The audience will then be invited to ask questions and provide feedback. The committee highly values suggestions for improvements gathered during this time. The best student poster will also be awarded during the Closing Session, and each attendee will have an opportunity to win a door prize.

Online Program

American Statistical Association