Thursday, February 20

Thu, Feb 20
7:00 AM - 6:30 PM
Ballroom Foyer

Registration
Registration

Thu, Feb 20
8:00 AM - 5:30 PM
Regency A

SC1 - The Tlverse Software Ecosystem for Targeted Learning
Short Course (full day)

Instructor(s): Alan Hubbard, University of California, Berkeley; Mark van der Laan, University of California, Berkeley

Download Handouts

This full-day short course will provide a comprehensive introduction to the field of targeted learning and the corresponding tlverse software ecosystem (https://github.com/tlverse). In particular, we will focus on targeted minimum loss-based estimators of causal effects, including those of static, dynamic, optimal dynamic, and stochastic interventions. These multiply robust, efficient plug-in estimators use state-of-the-art, ensemble machine learning tools to flexibly adjust for confounding while yielding valid statistical inference. In addition to discussion, this workshop will incorporate both interactive activities and hands-on, guided R programming exercises, to allow participants the opportunity to familiarize themselves with methodology and tools that will translate to real-world data analysis. It is highly recommended for participants to have an understanding of basic statistical concepts such as confounding, probability distributions, confidence intervals, hypothesis tests, and regression. Advanced knowledge of mathematical statistics may be useful but is not necessary. Familiarity with the R programming language will be essential.

Outline & Objectives

By the end of this course participants should be able to:

1. Discuss the utility of the robust estimation strategy of targeted learning in comparison to conventional techniques, which often rely on restrictive statistical models and may therefore lead to severely biased inference.

2. Utilize the super learner, a loss-function-based tool that uses V-fold cross-validation, to obtain the best prediction of the parameter of interest.

3. Calculate nonparametric variable importance metrics with both the super learner and targeted minimum loss-based estimators.

4. Estimate the causal effect of an intervention under static, dynamic, optimal individualized, and stochastic regimes using the tlverse.

5. Implement targeted minimum loss-based estimators when the outcome is subject to missingness, when mediators are present on the causal pathway, in high dimensions, and in studies with two-phase sampling.

6. Interpret the effect of interest under the real-world scenarios mentioned in learning objectives 4 and 5.

7. Construct novel targeted minimum loss-based estimators to extend the tlverse ecosystem of R packages.

About the Instructor

Mark van der Laan, PhD, is Professor of Biostatistics and Statistics at UC
Berkeley. His research group developed loss-based super learning in
semiparametric models, based on cross-validation, as a generic optimal tool for
the estimation of infinite-dimensional parameters, such as nonparametric density estimation and prediction with censored data. Building on this work, Mark's research group developed targeted minimum loss-based estimation as a general optimal methodology for statistical and causal inference. Recently, his group has worked towards developing a principled set of software tools for targeted learning, the tlverse.

Alan Hubbard, PhD, is Professor of Biostatistics. Research in Alan's group is generally motivated by applied problems in computational biology, epidemiology, and precision medicine.

This short course will also be instructed by Jeremy Coyle, PhD, a consulting data scientist who is leading the software development effort that has produced the tlverse ecosystem of R packages. Since the development of this workshop was a joint effort, the following PhD students in biostatistics will also co-instruct: Nima Hejazi, Ivana Malenica, and Rachael Phillips.

Relevance to Conference Goals

This full-day short course will provide participants with practical knowledge about analyzing data of various forms through the application of targeted learning, a state-of-the-art statistical method. Guided by R programming exercises, case studies, and intuitive explanation; participants will build a toolbox for applying the targeted
learning statistical methodology, which will translate to real-world causal inference and statistical analyses. We will feature a diversity of data, relevant to a broad range of applied statisticians.

The overall objective of this course is to provide training to students, researchers, industry professionals, faculty in science, public health, statistics, and other fields to empower them with the necessary knowledge and skills to utilize the sound methodology of Targeted Learning --- a technique that provides tailored pre-specified machines for answering queries, so that each data analysis is completely reproducible, and estimators are efficient, minimally biased, and provide formal statistical inference. This objective aligns with the conference goals, and thereby we believe that we would be a good fit for a full-day short course.

Thu, Feb 20
8:00 AM - 5:30 PM
Regency B

SC2 - Introduction to R: From Programming to Tidying to Analysis
Short Course (full day)

Instructor(s): Philip D. Waggoner, The University of Chicago

The use of R is rapidly increasing in all corners of data science and empirical research. This is for good reason as R is not only a fast and efficient programming language and environment for doing statistics and data analysis, but it is also free and open source. As such, this course will offer a high-level introduction to the statistical computing language of R from start to finish. We will cover a range of topics in "base R" as well as fold in the “tidy” approach to wrangling and visualization in R. The end result will be a fully equipped researcher/practitioner who can efficiently and effectively move from obtaining a messy, unorganized data set to a polished, presentable final product across a variety of domains and applications.

Outline & Objectives

The goals of the course are to get participants comfortable engaging in basic coding in R, wrangling and cleaning complex data, troubleshooting errors on their own, estimating widely used models, and transforming numerical output into visually pleasing figures. As the course is geared toward beginners, no prior coding experience (in or out of R) is assumed. We will start at the ground level to ensure that everyone is at the same place.

As a rough outline, we will cover:
1. Getting started with R and R Studio // Packages // Basic Programming
2. Loading, cleaning, and wrangling data
3. Statistics: widely-used model fitting, interpretation, diagnostics (T-tests, OLS, Binary Response and Count models)
4. Data Visualization: in Base R and the Tidyverse
5. (If time) Advanced Topics: Basic Webscraping and Text Analysis (preprocessing and wordclouds)

The goal is for a high level introduction to the practical use of R for a host of applications and fields. Thus, we start at the ground level and no prerequisites or prior coding experience is necessary. Some level of basic applied statistics would be useful (but not required) to fully understanding the model fitting portion.

About the Instructor

I have been using R professionally for many years, and incorporated in my Ph.D. dissertation. Further, I have taught a semester-version of this course to Master of Public Policy students at the College of William & Mary. Further, I have written and coauthored many R packages of my own, as well as I am a member of "easystats" which is a software development group focused on writing packages to make statistics in R easy (https://github.com/orgs/easystats/people). Further, a colleague (Ryan Kennedy, University of Houston) and I are writing a book on introducing the Tidyverse version of R to the social science community. I already have scripts and many example datasets, as well as "worksheets" (.Rmd files) prepared for all units. These are available at my Github: https://github.com/pdwaggoner/Intro-to-R . Thus, I am prepared, experienced, and eager to present a high-level introduction to R to non-users or those wanting to widen their scope of statistical programming a bit more.

Relevance to Conference Goals

1. Learn statistical methods or programming techniques that apply to their job as applied statisticians: For this first goal, as this course is geared towards beginners, the assumption is that those who sign up will be eager to learn new techniques, which I will teach from start to finish. Further, I will give students sample data and R scripts for all topics so they can use adapt and extend these concepts in the future for their own reasons.

2. Better communicate and collaborate with their clients and customers: By learning these techniques, as well as how they fit into a broader framework of a consolidated research project, users will avoid the "piecemeal"/self-taught route of learning R which inevitably produces gaps in understanding. Instead, by taking this class, students will learn how all of these pieces (from wrangling to programming to fitting models and visualizing results) fit together and thus how they can best present information to interested parties.

3. Have a positive effect on their organization or enhance their professional development: The previous two goals being met, this third goal is a natural byproduct, where learning more == empowerment == excitement!

Thu, Feb 20
8:00 AM - 5:30 PM
Golden State

SC3 - Hands-On Introduction to Python in Data Science
Short Course (full day)

Instructor(s): Mei Najim, Advanced Analytics Consulting Services, LLC

Download Handouts

This course is designed to provide a hands-on introduction to Python, the well- known open-source programming language for data science including predictive modeling and data analysis. A case study using insurance data is employed in order to methodically expose attendees to data science best practices and hands-on experience in Python. Sample data and Python coding are provided.

Outline & Objectives

Outline:
(1) Learn how Jupyter Notebooks work, and cover the basics of programming including data structures, data operations, if else statements, for and while loops, and logical operations, etc.
(2) An in-depth Predictive Analytics Case Study in Insurance

Learning Objectives: Get some hands-on experience in Python
(1) Learn how to explore and prepare data in Python
(2) Use a variety of statistical methods and machine learn algorithms: GLM, decision trees and random forests, neural nets to build predictive models in Python.

Audiences: Statisticians, such as manufacturing, pharmaceutical, banking and government agencies; Statistical researchers/analysts in universities; Graduate students in statistics departments.

Prerequisites: BS/MS level education in statistics or mathematics with some programming experience; Install Jupyter Notebooks.

About the Instructor

Mrs. Mei Najim provides advanced analytics consulting services to the Property & Casualty insurance industry mainly in Strategic Planning (Developing advanced analytics strategic short-term and long-term plans for the organization) and Advanced Analytics Capability Building (Developing full life cycle analytics processes from raw data exploration to analytics solutions implementation into IT data systems). Mei has 15 years hands-on big data advanced analytics experience including statistical methods, machine learning algorithms, and data mining in the Property & Casualty insurance industry. She also has experience in catastrophic modeling, actuarial pricing, reserving, and R&D. Mei has frequently presented at conferences to share and further develop her expertise. Mei holds a BS degree in Actuarial Science from Hunan University and two MS degrees, one in Applied Mathematics and the other in Statistics, from Washington State University. Mei is a member of the American Statistical Association and a Certified Specialist in Predictive Analytics (CSPA) of the Casualty of Actuary.

Relevance to Conference Goals

The objective is to provide attendees with hands-on experience about data science, modeling, and analyzing data of various forms through the application of state-of-the-art statistical methods and machine learning algorithms in Python.

Thu, Feb 20
8:00 AM - 12:00 PM
Regency C

SC4 - Side-by-Side Learning of R and Python by Analyzing Big Longitudinal Data
Short Course (half day)

Instructor(s): Mohammed Rahim Uddin Chowdhury, Kennesaw State University

R and Python are two highly used open-source interpreted programming languages with a large and diverse community. Due to the open-source nature, new libraries are developed and added continuously to their respective catalog for researchers when new Mathematical, Statistical or other models are discovered. R has more than 12000 packages available in CRAN (open-source repository), which researchers can use to perform whatever analysis they need. The rich variety of library makes R the first choice for statistical analysis, especially for specialized analytical work. On the other hand, Python does not have that many packages for data analysis and data modeling. Most of the data science job can be done with five Python libraries: Numpy, Pandas, Scipy, Scikit-learn and Seaborn. However, it is known to the scientific community that Python is catching up R by rapidly developing packages for data mining and statistical modeling. In this short course at CSP 2020, I will show in details the side by side comparisons between R and Python on six topics such as data mining and data analysis, test of hypothesis, correlation and regression, simulation, mathematical computations, text mining.

Outline & Objectives

The outline of the short course is to discuss the application of R and Python on the problems of
1. Data mining and data analysis (consists of 50 different data mining problems)
2. Test of Hypotheses and confidence interval (consists of 20 different problems)
3. Regression models (16 different models will be discussed)
4. Simulations (9 different simulation design will be discussed)
5. Mathematical Computations (50 different problems will be computed)
6. Text mining (Word cloud, sentimental analysis, and most graphs for frequently used word will be discussed)

The objective of this short course is to train participants on how to use R and Python simultaneously in solving problems from above mentioned topics for their professional works. The instructor of the short course does not require that the participants should have prior knowledge of using R and Python. The instructor will provide all the problems in easily understandable questions format together with R and Python programming code. First, the instructor will discuss the problems, and then he will run the R and Python code together with the participants.

About the Instructor

I obtained my PhD degree in Statistics in 2013, and working as a tenure track Assistant Professor of Statistics in the Department of Statistics and Analytical Science at Kennesaw State University since August 2015. During my four years at KSU, I have taught altogether ten unique undergraduate and graduate courses, which is more than two new courses per year. Five courses are undergraduate courses and they are as varied as introductory statistics courses up to R and Python programming. I was motivated to teach python programming as it has a high and growing demand in industry, and many employers want data engineer with expertise in python. Five other courses are graduate courses. I taught a theoretical and computation Bayesian Statistics special topic course for graduate students. R programming language was used to teach computational parts such as EM algorithm, MCMC, Gibbs sampling, Metropolis algorithm, and Metropolis-Hasting algorithm. Another graduate course is Applied Time Series Analysis. For teaching most courses, I always prefer R programming language. I taught the undergraduate R programming course in Fall 2018. In Spring 2019, I am taught Python Programming course.

Relevance to Conference Goals

‘Conference on Statistical Practice’ is usually considered a platform for applied researchers, who use novel statistical and machine learning methods to solve data driven problems. To solve data driven problem, R and Python have built in packages to use. This short course will introduce both R and Python to analyze a big longitudinal data. In additional various simulation designs and text mining will be discussed in this course. This course will help any person interested to learn R and Python from the scratch.

Thu, Feb 20
8:00 AM - 12:00 PM
Regency D

SC5 - Essential Collaboration: The ASCCR Frame
Short Course (half day)

Instructor(s): Heather Smith, Cal Poly; Eric Vance, LISA-University of Colorado Boulder

Download Handouts

Statisticians and data scientists often collaborate with domain experts from many different fields in academia, business, and government. Learning more effective collaboration skills will enable us to maximize our professional impact in these areas. In this short course, participants will learn and practice essential skills that will enable them to improve their collaborations and add more value to their projects, customers, and organizations. We introduce the ASCCR framework that describes our current best practices for five aspects of statistical consulting and collaboration (Attitude-Structure-Content-Communication-Relationship). Specifically, participants will learn how to establish foundational collaborative Attitudes, implement the POWER Structure for conducting effective meetings, apply the Q1Q2Q3 approach to consultations and collaborations, Communicate more effectively, and adopt practical strategies to strengthen Relationships. Participants will practice these skills via team exercises, role-plays, video coaching, and individual reflections to become more effective collaborators, allowing them to have greater impact in their roles as statisticians and data scientists.

Outline & Objectives

Our objective is to introduce key concepts that will help participants improve their collaboration skills so they can return to key roles within their organizations and achieve greater impact. This short course will be useful for all levels from beginning to advanced. Prerequisites are a desire to improve one’s personal effectiveness and openness to try new methods and ways of thinking in the practice of statistics and data science.

1 Welcome and warm-up team exercises
2 Introduction to ASCCR Frame
3 Attitude of effective collaboration (participants complete Attitude checklist)
4 POWER structure (Prepare-Open-Work-End-Reflect) and why we believe this structure produces effective meetings
5 Best practices for opening meetings (Eric and Heather mock role play, video review, then participants role play)
6 Best practices for ending meetings (Eric and Heather mock role play, video review)
Break
7 Q1Q2Q3 approach to the Content of statistical projects (reflection exercise)
8 Triangle of Statistical Communication (team discussion)
9 Tips for strengthening Relationships (reflection exercise)
10 Overall written reflection and individual plan for improving collaboration skills.

About the Instructor

For the past 11 years, Dr. Eric Vance, an Associate Professor at the University of Colorado Boulder, has been the director of LISA (Laboratory for Interdisciplinary Statistical Analysis) where he has trained 271 statisticians to move between theory and practice to collaborate with 9500+ domain experts to apply statistics and data science to answer their research or business questions. He has taught workshops and webinars on collaboration in nine countries around the world, including several in collaboration with Heather Smith.

Heather Smith has 28 years of experience consulting with academic, industrial, service, and government clients in the United States, Europe, and Asia. She began this work as a statistical consultant at Westat, Inc. For 21 years she has been a faculty member in the Statistics Department at Cal Poly San Luis Obispo where she consults with academic and private sector researchers and teaches a wide variety of applied statistics courses, including courses in statistical communication and consulting. She has offered over a dozen workshops, short courses, and webinars on these topics, and has trained hundreds of statistical collaborators.

Relevance to Conference Goals

This short course is relevant for all three of the three main conference goals. Participants will learn new skills and practical tips to apply whenever they interact with another person in their job as an applied statistician. Participants will explicitly learn how to better communicate and collaborate with their clients and customers. Skills learned in the course will equip participants to have a positive impact on their organization and an upward career trajectory. Participants will return to their jobs with new ideas, techniques, and strategies to improve their ability to communicate and collaborate effectively, resulting in a greater impact on their organizations and increasing the overall impact of statistics and data science in the world at large.

A version of this course was taught at the 2018 CSP and received a high average rating of 4.63 out of 5 (n=8 responding out of 22 participants). The official qualitative feedback we received: “This course is essential for any statistician who needs to collaborate with people in other disciplines, or sell their business to clients. I very strongly recommend it.” Unofficial feedback was very positive as well.

Thu, Feb 20
1:30 PM - 5:30 PM
Regency C

SC6 - Increasing Business Impact Through Automated Reporting in R
Short Course (half day)

Instructor(s): John Ennis, Aigora

Download Handouts

Effective communication of results is among the essential duties of the industrial statistician, but the sometimes tedious mechanics of report production together with the sheer volume of data that many statisticians now must process combine to make reporting design an afterthought in too many cases. In this half-day course, we review recent advances in automated report production that liberate resources for statisticians to focus on the interpretation and communication of results, while simultaneously reducing errors and increasing consistency of analyses. We teach the course through an extended example, cumulatively building an R script that takes participates from receipt of an example dataset to a beautifully-designed and nearly completed PowerPoint presentation automatically and using freely available, open-source packages. Details of how to customize the final presentation to incorporate corporate branding - such as logos, font choices, and color palettes - will also be covered.

Level: We recommend a minimal level of experience using R, RStudio, and the tidyverse.

Outline & Objectives

With this half-day course, we help industrial statisticians increase their business impact by leveraging tools for automated report production in R.

Topics covered include:

* What does automated reporting mean in practice?
* Scripting analyses, tables, and charts
* Automated production of PowerPoint presentations
* Building a "cookbook" of reporting recipes
* Font choices and color palettes
* Layering storytelling onto an automated report

About the Instructor

Dr. John Ennis is president of Aigora (www.aigora.com), a consulting and coaching organization dedicated to helping market researchers prepare for the rise of artificial intelligence. As part of this preparation, Aigora provides instruction in the automation of standard work practices, including report preparation. Dr. Ennis, a Ph.D. mathematician who conducted his postdoctoral training in computational neuroscience, has 11+ years of market research consulting experience, has presented at JSM and CSP, and will have presented at SDSS by the time of CSP 2020. In addition, Dr. Ennis is the author of over 30 peer-reviewed publications and two books on quantitative market research topics. Earlier this year, Dr. Ennis branched out from the Institute for Perception to found Aigora - in his prior work, Dr. Ennis was a well-reviewed instructor at dozens of short courses covering quantitative market research, including instruction on topics within data science. In his professional work, Dr. Ennis has used tools for automated reporting for approximately five years, and he now teaches such tools to his clients operating within a variety of enterprise-level businesses.

Relevance to Conference Goals

Through participation in this course, attendees will learn to support their internal clients with well-designed and easy-to-read reports they prepare quickly and can continually improve over time, building their credibility and influence within their organizations.

Thu, Feb 20
1:30 PM - 5:30 PM
Regency D

SC7 - Building LaTeX Templates for R Markdown to Produce Branded PDF Reports
Short Course (half day)

Instructor(s): Ben Barnard, Wells Fargo

Branded reports give a clean, clear and consistent message for data science teams in an organization. We walk through the process of building a latex template distributed through an R package. We begin with a short introduction to rmarkdown and some motivating examples for using branded reports. Then, we demonstrate from scratch how one can build a minimal latex template, and distribute in a R package. We describe some best practices for branding and highlight use of ggplot2 themes to match document branding. Finally, we walk through some further uses such as parameterized reports, using the template for bookdown, and recommendation for deploying the R package at your company.

Outline & Objectives

The student should be able to walk away from this class with:
1. a general understanding of rmarkdown,
2. why it is important to have branded reports,
3. a R package with a latex template that uses their companies branding,
4. understanding of best practices in branding,
5. use of ggplot2 themes,
6 and some possible further uses for the using and distributing the template.

About the Instructor

Ben Barnard is a Data Scientist at Wells Fargo in the Team Member Insights group. Ben has a PhD from Baylor University in Statistics.

Jeff Idle is an Analytic Manager at Wells Fargo in the Team Member Insights group. Jeff leads the HR Advanced Analytics & Architecture team. Jeff is currently pursuing a MBA from the University of Minnesota's Carlson School of Management.

Relevance to Conference Goals

We stress using branded reports to communicate clean, clear and consistent messages to your audience. Communication is the most important part of Data Science since decision makers are rarely analytic experts. Branded reports bring a certain professionalism that will be greatly appreciated by administration. Building the latex templates saves time and makes sure every report comes out looking the same. Consistently branded reports allows your team to be recognized immediately by your work product.

Thu, Feb 20
5:30 PM - 7:00 PM
Regency EF

PS1 - Poster Session 1 and Opening Mixer
Poster Session

Chair(s): Alek Kotolyan, dot818

Htest.Clust: An R Package for Marginal Inference of Clustered Data with Cluster and Group Size Informativeness
View Presentation Mary Gregg, University of Louisville

WITHDRAWN: A Comparison of Linear Mixed Effect Model and REEM Tree for Prediction of Cognitive Decline

Detecting Fake Images via Multiscale Methods in High-Dimensional Data
Hee-Seok Oh, Seoul National University; Minsu Park, Samsung Medical Center

Interval-Censored Survival Analysis: A Practical and Underused Tool for Observational Data
Travis Snyder, Imgen, SimonMed, Touro University Nevada; Cheryl Vanier, Touro University Nevada

Exploring 2- and 3-Way Interactions in Regression Models with R and RFSA
Joshua Lambert, University of Cincinnati

Impact of Baseline Covariate Imbalance on Bias in Treatment Effect Estimation in Cluster Randomized Trials: Race as an Example
View Presentation Siyun Yang, Duke University

Using Poisson Binomial Models to Reveal Voter Preferences
View Presentation Evan Taylor Ragosa Rosenman, Stanford University

PICC: A Peer-Run Support Model for Statistical Collaborators
View Presentation Amy Michiko Lehman, The Ohio State University; Julie Ann Stephens, The Ohio State University

Developing a Classification System Within a Federal Statistical Agency
View Presentation Darius Singpurwalla, National Center for Science and Engineering Statistics

De-Duplication Strategies in Mobile Health Clinical Studies
View Presentation Ariadna Garcia, Stanford University; Vidhya Balasubramanian, Stanford University Quantitative Sciences Unit; Haley Hedlin, Stanford University Quantitative Sciences Unit

Longitudinal Parallel-Process Modeling: An Application to a Study of Pediatric Stem Cell Transplant Patients
View Presentation Paula M. Murray, Children's Hospital Los Angeles

Development of Novel Online Prognostic Tool to Predict Long-Term Survival After Liver Resection
View Presentation Rittal Mehta, The Ohio State University James Cancer Center

Model-Based Standardization Using an Outcome Model with Random Effects
View Presentation Zhongkai Wang, University of Florida

WITHDRAWN: From Data to Decision Support Tools (DST): Web-Based Applications for Stakeholder Education and Engagement in the Coeur D’Alene (CDA) River Basin Superfund Site

Interactive and Dynamic Statistical Reports Using R Shiny Apps Integrated with REDCap: Example from Arkansas Active Kids Study
View Presentation Zhuopei Hu, University of Arkansas for Medical Sciences

WITHDRAWN: Data Visualization for Factor Analysis

Step-Wedge Design to Evaluate the Effectiveness of Opioid Prescribing Aids
View Presentation Caroline Ledbetter, University of Colorado

Thu, Feb 20
5:30 PM - 7:00 PM
Regency EF

Exhibits Open
Exhibits

Friday, February 21

Fri, Feb 21
7:30 AM - 5:30 PM
Ballroom Foyer

Registration
Registration

Fri, Feb 21
7:30 AM - 8:30 AM
Regency EF

Continental Breakfast
Other

Fri, Feb 21
7:30 AM - 6:30 PM
Regency EF

Exhibits Open
Exhibits

Fri, Feb 21
8:00 AM - 9:00 AM
Regency BC

GS1 - Keynote Address
General Session

8:05 AM

The Ethical Statistician and Data Scientist
View Presentation Wendy L. Martinez, Bureau of Labor Statistics

Fri, Feb 21
9:15 AM - 10:45 AM
Regency C

CS01 - Fit Data, Fit Analysis
Concurrent Session

Chair(s): Megan Elyse Lutz, University of Georgia

9:20 AM

Q&A with Wendy Martinez
Wendy L. Martinez, Bureau of Labor Statistics

10:05 AM

Quality as Fitness for Use
View Presentation Andrea Lee Ness, Statistics Canada

Fri, Feb 21
9:15 AM - 10:45 AM
Regency B

CS02 - Feature Identification in Complex Multivariate Systems
Concurrent Session

Chair(s): Cheryl Vanier, Touro University Nevada

9:20 AM

Mediation Analysis with an Ordinal Outcome Using Empirical Data
View Presentation Kristina P Vatcheva, The University of Texas Rio Grande Valley

10:05 AM

A Severe Weather Index Based on the Historic National Oceanic and Atmospheric Administration (NOAA) Data
View Presentation Thilini Vasana Mahanama, Texas Tech University

Fri, Feb 21
9:15 AM - 9:50 AM
Regency A

CS03 - The Birds and the ps
Concurrent Session

Chair(s): Chris Barker, Statistical Planning and Analysis Services, Inc.

9:20 AM

The birds and the ps. Giving THE TALK to collaborators.
View Presentation Mary J Kwasny, Northwestern University

Fri, Feb 21
9:15 AM - 10:45 AM
Regency D

CS04 - Pipeline and Parallel Computing Using R
Concurrent Session

Chair(s): Frost Hubbard, Westat

9:20 AM

Data Delivery: From Paper to Pipeline Using R
View Presentation Danielle Beaulieu, Origent Data Sciences

10:05 AM

Embarrassingly Parallel R: Getting Started with Parallelization in R
View Presentation Jonathan Kane Storey, Mississippi State University Institute for Systems Engineering Research

Fri, Feb 21
10:00 AM - 12:30 PM
Regency A

CS05 - Adventuring Beyond P < 0.05
Concurrent Session

Chair(s): Zach Weller, Colorado State University

10:05 AM

Adventuring Beyond P < 0.05
View Presentation Karen Grace-Martin, The Analysis Factor; Tom Gwise, FDA; Megan Higgs, Independent consultant; Dan Jeske, UC-Riverside; Ruixiao Lu, Genomic Health; Wendy L. Martinez, Bureau of Labor Statistics

Fri, Feb 21
10:45 AM - 11:00 AM
Regency EF

Refreshment Break
Other

Fri, Feb 21
11:00 AM - 12:30 PM
Regency B

CS06 - Adventures in Regression
Concurrent Session

Chair(s): Qiao Ma, Google

11:05 AM

Multilevel Regression with Poststratification for Local Estimation: An Example and Lessons Learned
View Presentation Travis Loux, Saint Louis University

11:50 AM

Counterfactual Analysis of Cross-Sectional Data Using Quantile Process Regression
View Presentation Yonggang Yao, SAS Institute, Inc.

Fri, Feb 21
11:00 AM - 12:30 PM
Regency C

CS07 - Mining with Machine Learning
Concurrent Session

Chair(s): Lazarus K Mramba, University of Kansas Medical Center

11:05 AM

Statistical Image Processing for Machine Learning
View Presentation Vikram Krishnamurthy, Alliance Innovation Lab Silicon Valley

11:50 AM

Finding the Source of Grandma’s Chili: Investigative Text Mining
View Presentation Scott Lee Wise, SAS Institute, Inc.

Fri, Feb 21
11:00 AM - 12:30 PM
Regency D

CS08 - Making a Difference in the Real World? Applications of Meta-Analysis
Concurrent Session

Chair(s): Grant Innerst, Shippensburg University

11:05 AM

How to Use Meta-Analysis to Solve Real-World Problems?
Qing Wu, University of Nevada, Las Vegas

11:50 AM

WITHDRAWN: Thinking Statistically in Social Science and Humanities (SSH) Research: An Example of Meta-Analysis of Foreign Language Classroom Assessment Using R
SONGTAO WANG, University of Victoria

Fri, Feb 21
12:30 PM - 2:00 PM

Lunch (On Own)
Other

Fri, Feb 21
2:00 PM - 3:30 PM
Regency A

CS09 - Leading with Statistics
Concurrent Session

Chair(s): Jeffrey C. Farmer, New Orleans Baptist Theological Seminary

2:05 PM

Project Management for Statisticians
View Presentation Kathleen A Jablonski, The George Washington University

2:50 PM

Beyond Influencing, Toward Leading: Statisticians as Organization Leaders
View Presentation Monica L Johnston, M. Lee & Company

Fri, Feb 21
2:00 PM - 3:30 PM
Regency B

CS10 - Interval Estimation
Concurrent Session

Chair(s): Melissa Innerst, Juniata College

2:05 PM

Physically Plausible PDFs for Intervals Between Geyser Eruptions
View Presentation Gordon R Bower, Excelsior Statistics and Optimization

2:50 PM

Identifying Latent Effect in Time-Series Data with Applications to Problems in Economics and Veterinary Parasitology
Anand N Vidyashankar, George Mason University

Fri, Feb 21
2:00 PM - 3:30 PM
Regency C

CS11 - Big Data - Big Problems
Concurrent Session

Chair(s): Sumihiro Suzuki, UNT Health Science Center

2:05 PM

Innovative Approaches to Reduce Census Nonresponse Follow-Up
View Presentation Monique Sidebottom, Statistics Canada

2:50 PM

WITHDRAWN: Ensemble Imputation for DNA Methylation Levels Across Platforms
Gang Li, The University of North Carolina at Chapel Hill

Fri, Feb 21
2:00 PM - 3:30 PM
Regency D

CS12 - Toward Automation: Safety Studies and Dose-Finding Designs
Concurrent Session

Chair(s): Kim Love, K. R. Love Quantitative Consulting and Collaboration

2:05 PM

A Practical Approach to Achieving Person-Years Needed in Post-Marketing Long-Term Safety Studies
View Presentation Raghava Danwada, AbbVie

2:50 PM

Iadapt: An R Package and Shiny App for Simulation and Implementation of Early Phase Dose-Finding Designs Incorporating Toxicity and Continuous Efficacy Outcomes
View Presentation Alyssa M Vanderbeek, Columbia University

Fri, Feb 21
3:30 PM - 3:45 PM
Regency EF

Refreshment Break
Other

Fri, Feb 21
3:45 PM - 5:15 PM
Regency A

CS13 - Statistics in a Modern World
Concurrent Session

Chair(s): Thor D. Osborn, Sandia National Laboratories

3:50 PM

Ways to Increase Effectiveness and Reduce Feelings of Isolation When Working Remotely
View Presentation Frost Hubbard, Westat

4:35 PM

Developing the Modern Statistician in a National Statistical Office (NSO)
View Presentation Pierre Caron, Statistics Canada

Fri, Feb 21
3:45 PM - 5:15 PM
Regency B

CS14 - Communication with ADEPT and Methods for Sparse Data
Concurrent Session

Chair(s): Craig N. Refugio, Negros Oriental State University, Philippines

3:50 PM

The ADEPT Framework for Communicating Statistical Concepts to Non-Statisticians: A Mini-Workshop
View Presentation Karen Grace-Martin, The Analysis Factor

4:35 PM

Bayesian Analysis of Sparse Multivariate Matched Proportions Data
View Presentation Mark J Meyer, Georgetown University

Fri, Feb 21
3:45 PM - 5:15 PM
Regency C

CS15 - CSP Themes. Is it time to refocus? An interactive panel
Concurrent Session

3:50 PM

CSP Themes. Is it time to refocus? An interactive panel
David J. Corliss, Peace-Work; Mary J Kwasny, Northwestern University; Michael Regier, Verisk Analytics; Eric Vance, LISA-University of Colorado Boulder

Fri, Feb 21
3:45 PM - 5:15 PM
Regency D

CS16 - Data Visualization and Output for Reporting
Concurrent Session

Chair(s): Darius Singpurwalla, National Center for Science and Engineering Statistics

3:50 PM

SAS Output Delivery System (ODS): Practical Examples of Various Destinations: Excel, PDF, RTF, PowerPoint, and Output
View Presentation DANY GUERENDO CHRISTIAN, STATProg Inc.

4:35 PM

Data Visualization Using Power BI for Statistics Canada’s Monthly Survey of Manufacturing
View Presentation Michelle Caruso, Statistics Canada

Fri, Feb 21
5:15 PM - 6:30 PM
Regency EF

PS2 - Poster Session 2 and Refreshments
Poster Session

Chair(s): Alok Kumar Dwivedi, Texas Tech University

Monotonic Nonparametric Dose Response Model
Faten Alamri , Princess Nourah bint Abdulrahman University/Virginia Commonwealth University

Detecting Data Falsification in Surveys
Dhafer Malouche, Yale University

A Machine-Based Approach to Preoperatively Identify Patients with the Most and Least Benefit Associated with Resection for Intrahepatic Cholangiocarcinoma
Rittal Mehta, The Ohio State University James Cancer Center

Comparing Methods for Regression Models in Which the Dependent Variable Is Based on Estimates
View Presentation Yi Mu, Centers for Disease Control and Prevention

Zero-Inflated Covariates: Should We Care About It?
View Presentation Milan Bimali, University of Arkansas for Medical Sciences

An Algorithm for Post-Processing Medication Information Extracted by Natural Language Processing Systems from Electronic Health Records
View Presentation Leena Choi, Vanderbilt University Medical Center

Modeling and Forecasting Mortality Rates Data from Upper-Middle-Income Economies: A Machine Learning Approach
View Presentation Ahmad Talafha, Western Michigan University; Emmanuel Thompson, Southeast Missouri State University

Visualization of Affinity Maturation Based on Next-Generation Sequencing B Cell Receptor Sequencing Data
View Presentation Hai Yang, University of California, San Francisco

Adaptive 3D Segmentation of LiDAR Data for Object Detection and Localization for Autonomous Vehicles and Robots
Rita Chattopadhyay, Intel

Evaluation of Multivariate Classification Models for Analyzing NMR Metabolomics Data
Thao T. Vu, University of Nebraska - Lincoln

Classifying Symptom Trajectories in Patients with Mild Cognitive Impairment
View Presentation Sudeshna Paul, Emory University

Arcus Education: Children's Hospital of Philadelphia's Individualized, Quasi-Flipped, Modular, Up-Scaled Education Strategy
View Presentation Sheila Anne Braun, Children's Hospital of Philadelphia

A Computationally Efficient Method for Selecting a Split Questionnaire Design
View Presentation Matthew Stuart, Iowa State University

Modeling Longitudinal Change in Biomarkers in the Presence of Disease Treatment with Application to the Atherosclerosis Risk in Communities (ARIC) Study
View Presentation Nicole Butera, The George Washington University

Comparison of Methods to Analyze Clustered Time-to-Event Data with Competing Risks
View Presentation Wenhan A Lu, Yale University; Zehua Pan, Yale University

WITHDRAWN: Business Planning with Caseload Forecasting Models and Spatial Analysis

Using Machine Learning to Model Cancellation in Leadership Training Programs
View Presentation Sarah J. Pearsall, Center for Creative Leadership; Philip Turk, Western Data Analytics, LLC

Saturday, February 22

Sat, Feb 22
7:30 AM - 2:30 PM
Ballroom Foyer

Registration
Registration

Sat, Feb 22
7:30 AM - 1:00 PM
Regency EF

Exhibits Open
Exhibits

Sat, Feb 22
8:00 AM - 9:15 AM
Regency EF

PS3 - Poster Session 3 and Continental Breakfast
Poster Session

Chair(s): Sudeshna Paul, Emory University

Estimation of Semiparametric Functional-Coefficient Panel Data Models with Individual and Time Fixed Effects
Shaymal Halder, Auburn University

Comparative Analysis of the NHANES Public-Use and Restricted-Use Linked Mortality Files
View Presentation Suad El Burai Felix, National Center for Health Statistics

Visualizing Kurtosis
View Presentation Peter Westfall, Texas Tech University

The Importance of Timestamping for Data Integrity in Studies Incorporating Mobile Health Data
View Presentation Vidhya Balasubramanian, Stanford University Quantitative Sciences Unit; Ariadna Garcia, Stanford University; Haley Hedlin, Stanford University Quantitative Sciences Unit

An Evaluation of the Trimmed Means Approach in the Context of Randomized Controlled Trials
View Presentation Doug F Arbetter, Veristat

Principal Component-Guided Sparse Regression
View Presentation Jingyi Kenneth Tay, Stanford University

Subsemble Estimation for Spatial Count Data
Aimee Schwab-McCoy, Creighton University

Advancing Industry Statistical Methods Through Cross-Collaboration
View Presentation Katy Marie Wrenn, The Boeing Company

The Impact and Limitations of Stratified Imputation
View Presentation Dianna J Spence, University of North Georgia; Gregg A Velatini, University of North Georgia

Statistical Consulting and Collaboration at UNL: Past, Present, and Future
View Presentation Kelsey Nicole Karnik, University of Nebraska - Lincoln

Statistical Advocacy: Making Room for Myself at the Table
View Presentation Megan Elyse Lutz, University of Georgia

WITHDRAWN: The Application of Structural Equation Modeling in an Educational Setting

Interpreting Cluster Analysis Results: Using Relative Importance Methods as a Decision Aid
View Presentation Joseph Nicholas Luchman, Fors Marsh Group

Developing Data Models and Documentation for Complex Longitudinal Studies: Lessons Learned from NCANDA
Kevin M Cummins, University of California, San Diego

Persuading Investigators to Report Standardized Differences in Observational Studies
View Presentation Kyle Porter, The Ohio State University

Predictive Modeling for Over-Dispersed Proportion Data Using Some Completing Proportion Models
Nargis Akhter, Central CT State University

A Caution in the Use of Bootstrap Confidence Intervals
View Presentation Chuchu Cheng, Boston College

Turning Survey Data into Information and Infographics
View Presentation Diane M Hindmarsh, NSW Bureau of Health Information

A Comparison of Statistical Methods for Estimating Adjusted Hazard Ratio from the National Cancer Database
View Presentation Dongliang Wang, SUNY Upstate Medical University

Finding Optimal Cutoff Value Based on Inflated Mixture Distributions and Its Application to T Cell Repertoire Sequencing Data
View Presentation Jason Baik, San Francisco State University

Alternative to Hazard Ratio in Measuring the Between-Group Variation for Patients Undergoing Surgery
Rittal Mehta, The Ohio State University James Cancer Center

Sat, Feb 22
9:15 AM - 10:45 AM
Regency A

CS17 - Essential Collaboration Skills
Concurrent Session

Chair(s): Paul Berg, Eli Lilly & Company

9:20 AM

Essential Collaboration Skills: Attitude of Collaboration
View Presentation Eric Vance, LISA-University of Colorado Boulder

10:05 AM

Essential Collaboration Skills: Creating and Sustaining Productive Relationships
View Presentation Heather Smith, Cal Poly

Sat, Feb 22
9:15 AM - 10:45 AM
Regency B

CS18 - Statistical Methods in Health Care
Concurrent Session

Chair(s): Mohammed Rahim Uddin Chowdhury, Kennesaw State University

9:20 AM

A Comparison of Propensity Score Methods for a Diabetes Study at a Community Health Center
View Presentation Brandy Sinco, University of Michigan

10:05 AM

Efficient Nonparametric Estimation of Population Size from Incomplete Lists
Manjari Das, Carnegie Mellon University

Sat, Feb 22
9:15 AM - 10:45 AM
Regency C

CS19 - Taxonomy Stories: Human-Centered Classification
Concurrent Session

Chair(s): Joshua Lambert, University of Cincinnati

9:20 AM

Student Retention Modeling by Use of Pre- and Post-Enrollment Data
View Presentation Sima Sharghi, Bowling Green State University

10:05 AM

Multivariate Association Analysis with Correlated Traits in Families or Distantly Related Individuals
Souvik Seal, University of Minnesota

Sat, Feb 22
9:15 AM - 10:45 AM
Regency D

CS20 - Ethics Panel: Ethical Practices at the Intersection of Statistics and Public Service
Concurrent Session

Chair(s): David J. Corliss, Peace-Work

The focus for the Ethics Panel this year is the intersection of statistics and public service. This area leads to ethics questions on privacy of data in the public sphere, advising on ethical best practices for public sector agencies, financial support from industry for statistical testing e.g., pharma approvals, and working to do the best science in an increasingly politicized and polarized world.

Panelists:

Daniel Elchert, ASA Policy Fellow

Wendy Martinez, Bureau of Labor Statistics

Darius Singpurwalla, NSF/ National Center for Science and Engineering Statistics

Sat, Feb 22
10:45 AM - 11:00 AM
Regency EF

Refreshment Break
Other

Sat, Feb 22
11:00 AM - 12:30 PM
Regency A

CS21 - Going Public
Concurrent Session

Chair(s): Jay Mandrekar, Division of Biomedical Statistics and Informatics, Mayo Clinic

11:05 AM

Open Source Contribution and the Effectiveness of Public Work
View Presentation Amy Yang, Groupon

11:50 AM

What to Expect When You're Not Expecting - Incident Sampling
View Presentation Victoria Cox, Dstl

Sat, Feb 22
11:00 AM - 12:30 PM
Regency B

CS22 - Markov Models
Concurrent Session

Chair(s): Steven B. Cohen, RTI International

11:05 AM

Revisiting the Gelman-Rubin Diagnostic
View Presentation Christina Phan Knudson, University of St. Thomas

11:50 AM

Bayesian Statistics in R
View Presentation Christina Phan Knudson, University of St. Thomas

Sat, Feb 22
11:00 AM - 12:30 PM
Regency D

CS23 - Real-World Applications
Concurrent Session

Chair(s): Michelle Sarah Livings, University of Southern California

11:05 AM

A Surveillance Data-Based Model System for Assessing the Effects of HIV Intervention and Prevention Strategies
View Presentation Timothy A Green, Centers for Disease Control and Prevention; H. Irene Hall, Centers for Disease Control and Prevention; Ruiguang Song, Centers for Disease Control and Prevention

11:50 AM

Site Selection and Statistical Learning
View Presentation Kevin Edward Stoll, Bowling Green State University

Sat, Feb 22
11:00 AM - 12:30 PM
Regency C

CS24 - Policy and Support for Practicing Statisticians
Concurrent Session

Chair(s): Ron Gangnon, University of Wisconsin-Madison

11:05 AM

PICC: A Peer-Run Support Model for Statistical Collaborators
View Presentation Amy Michiko Lehman, The Ohio State University

11:50 AM

Intersections of Statistical Practice, Policy, and Advocacy
Daniel Elchert, American Statistical Association

Sat, Feb 22
12:30 PM - 2:00 PM

Lunch (on own)
Other

Sat, Feb 22
2:00 PM - 4:00 PM
Regency B

PCD1 - Meta-Analysis Using Stata
Practical Computing Demo

Instructor(s): Houssein Assaad, StataCorp LLC

Organizer(s): Brooke Erchinger, StataCorp LLC

This workshop will cover the use of Stata to perform meta-analysis (MA), a statistical technique for combining the results from several similar studies. The course will provide a brief introduction to MA and will demonstrate how to perform MA in Stata 16. Stata’s new meta command offers full support for MA—from computing various effect sizes and producing basic meta-analytic summary and forest plots to accounting for between-study heterogeneity and potential publication bias. A number of case studies demonstrating how to conduct an MA within Stata will be provided. These examples will focus on the interpretation of MA under various models, meta-regression and its postestimation features, subgroup analysis, small-study effect and publication bias, and various types of forest, funnel, and other plots. No prior knowledge of Stata is required, but basic familiarity with MA will prove useful.

Outline & Objectives

Outline
This workshop is geared toward researchers wanting to perform MA and those who already
know about MA and wish to learn how to do it using Stata.
1. Brief overview of MA
2. Data setup and effect sizes using meta set and meta esize
• Effect sizes for binary data
• Effect sizes for continuous data
• Generic (precomputed) effect sizes
3. MA models
• Random-effects model (seven estimation methods)
• Fixed-effects model (Mantel–Haenszel and inverse-variance methods)
• Common-effect model (Mantel–Haenszel and inverse-variance methods)
4. Graphical and numerical MA summary using meta summarize and meta forestplot
• Standard MA
• Subgroup MA with one or many grouping variables
• Cumulative MA with and without stratification
1
5. Meta-regression
• Continuous and categorical moderators
• Fixed-effects and random-effects regression
• Multiplicative and additive residual heterogeneity
• Knapp–Hartung standard-error adjustment
• Postestimation features: prediction, bubble plots, etc.
6. Small-study effects and publication bias
• Standard and contour-enhanced funnel plots
• Traditional and random-effects versions of tests for funnel-plot asymmetry or
small-study effects
• Nonparametric trim-and-fill method

Performance objectives
Participants of this workshop will walk away with the following knowledge:
• A brief overview of MA as a statistical procedure
• How to declare and compute effect sizes using meta set and meta esize
• How to summarize the meta-analytic results via meta sumamrize and meta forestplot
• How to interpret the results under different MA models
• How to address the problem of heterogeneity
• How to perform meta-regression using meta regress
• How to assess the validity of the MA against the threat of publication bias
• How to test for funnel-plot asymmetry using meta bias
• How to conduct a trim-and-fill analysis using meta trimfill
• How to differentiate between various reasons behind funnel-plot asymmetry

This presentation will provide methods and formulas and demonstrate how to perform MA
with real data. Participants who bring their own laptop will be able to interactively follow
along provided they have Stata 16 installed and a working Internet connection for down-
loading datasets from http://www.stata-press.com. However, interactive participation is
not required. The notes will provide sufficient information to reproduce all analyses at the
attendees’ convenience.

About the Instructor

Houssein Assaad is a Senior Statistician and Software Developer at StataCorp LLC and
the primary developer of Stata’s MA suite. Houssein has a PhD in statistics from the
University of Texas at Dallas. He is a former research assistant professor at Texas A&M
University, where his research focused on longitudinal and functional data analysis.

Relevance to Conference Goals

This demonstration will provide researchers with the tools to use MA in real-world applica-
tions. Participants will learn about MA as a statistical procedure and how to perform the
steps of MA in Stata.

Sat, Feb 22
2:00 PM - 4:00 PM
Carmel AB

PCD2 - Introducing the SAS BGLIMM Procedure for Bayesian Generalized Linear Mixed Models
Practical Computing Demo

Instructor(s): Amy Shi, SAS Institute, Inc.

Organizer(s): Fang Chen, SAS Institute, Inc.

SAS/STAT® 15.1 includes PROC BGLIMM, a new, high-performance, sampling-based procedure that provides full Bayesian inference for generalized linear mixed models (GLMMs). PROC BGLIMM models data from the exponential family distributions that have correlations or nonconstant variability; uses syntax similar to that of the MIXED and GLIMMIX procedures (the CLASS, MODEL, RANDOM, REPEATED, and ESTIMATE statements); deploys optimal sampling algorithms that are parallelized for performance; handles multilevel nested and non-nested random-effects models; and fits models to multivariate or longitudinal data with repeated measurements. PROC BGLIMM provides convenient access, with improved performance, to Bayesian analysis of complex mixed models that you could previously perform with the MCMC procedure. This workshop starts with a general discussion of Bayesian GLMM, then presents the important features of PROC BGLIMM, showing you how to use it for estimation, inference, and prediction.

Outline & Objectives

OUTLINE
1. Overview of Bayesian GLMM
2. Syntax and options of PROC BGLIMM
3. Demonstration of PROC BGLIMM through examples
3.1 Simple normal regression
3.2 Logistic regression with random intercepts
3.3 Normal regression with repeated measurements
3.4 Non-nested logistic random-effects model with prediction
3.5 Poisson regression with random effects
3.6 Repeated growth measurements with internal difference

Target Audience
This presentation is intended for a broad audience of statisticians who are interested in Bayesian inference for generalized linear mixed models. It would be helpful for attendees to have a basic understanding of normal regression analysis, generalized linear mixed models, and Bayesian methods, but it is not required.

LEARNING OUTCOMES
(a) Performance objectives
By attending this presentation, participants will improve their knowledge of generalized linear mixed models and Bayesian methods, and they will be able to use the BGLIMM procedure in SAS/STAT software to conduct Bayesian analyses.
(b) Content and instructional methods
The presentation will alternate between the use of slides and software demonstrations. Handouts given to attendees will cover both.

About the Instructor

Amy Shi is a senior research statistician developer in the Advanced Analytics Division at SAS Institute Inc. She received a Ph.D. in biostatistics from the University of North Carolina at Chapel Hill. She joined SAS in 2010, and her work involves implementation of Bayesian methods in software. Amy’s main responsibility is developing and enhancing SAS’ Bayesian capability, with a focus on generalized linear mixed models, discrete choice models, and multilevel hierarchical settings.

Relevance to Conference Goals

Sat, Feb 22
2:00 PM - 4:00 PM
Big Sur AB

PCD3 - AutoStat: A Single Application for Visualization, Data Querying, and Analytics Encompassing AI, Machine Learning, and Statistics
Practical Computing Demo

Organizer(s): Clair Alston-Knox, Predictive Analytics Group

Data is abundant in modern society, and a raft of statistical and machine learning algorithms have been developed to assist researchers, managers and lay-people to understand what inferences can be made from their data, and what decisions would best progress their goal. And yet, the current \p-value crisis in science" is evidence that even in the scientific community, access to these sophisticated algorithms is not owing through to many researchers, particularly those who do not have dedicated statistical or data science support.

The AutoStat Institute was founded by a group of academics and consultants who believe that this issue is, in a large part, due to the need to code in programs like R or Python to gain access to these algorithms. While the operability of these and similar platforms continues to improve, there are many potential users of data that will never have the skill set, time, interest or level of exposure required to become au fait with these packages. As a result, many users are excluded from realising the potential of the Big Data World by virtue of a coding barrier. AutoStat solves this problem by offering its users a modern feel GUI environment for sophisticated statistical analysis that aims to provide academics, students, business and interested people access to scalable modern algorithms and visualizations in a code free environment.

Outline & Objectives

This 2-hour workshop will focus on the user experience and provide practical demonstrations of both businessand research projects from the practical implementations of

Data management: Making new variables with the calculator tool, various methods for easy
data splitting test / train, merging datasets and much more.

Visualisations: Easy exploratory plots through to sophisticated layering approaches for publication and presentation quality output.

Model Building: A range of machine learning and statistical models (both frequentist and
Bayesian approaches)

Results and Inference: Standard outputs and tools to create users own inference metrics

Team work: Project sharing and collaboration from early stage data management, modeling
and report writing,

Tutorials and other help facilities to enable the user to get full benefit from their data analysis.
We will then illustrate how the software can enhance the research or business output using real case studies and implementing the following tools:

Pipeline construction for ease of updating results as new data becomes available via easy point,
click and record;

Dashboard building for effective deployment to end users and broadening the reach of your
research;

Document builders that are available in AutoStat with a range of templates that can be
customized by the user.

About the Instructor

Dr Clair Alston-Knox is a Senior Statistician with Predictive Analytics Group (Melbourne Australia). She had been an research and academic statistician since 1992, with a number of biometric and statistical consulting positions in government and universities. She joined Predictive Analytics and the AutoStat Institute in 2018 because her teaching, consulting, advising and ethics committee roles were frequently frustrated by researchers who were very capable of understanding the objective and benefits of statistical or machine learning approaches, but did not have the resources to learn the required platform to enable next level analysis.

Dr Theo Gazos is the Managing Director of Predictive Analytics Group. Theo has over 25 years of experience building economic and econometric models that isolate and quantify the impact of changing market dynamics (domestic and international), competition effects and government policy on private and government sector organisations. Theo is passionate about bringing the power of statistics and machine learning to all levels within organisations, and has used his years of experience to develop an interface and user ow within AutoStat R that makes this objective achievable.

Relevance to Conference Goals

Communication, collaboration and career development
AutoStat is an ideal environment for sophisticated statistical analysis, such as Bayesian models with stochastic search variable selection. The report building, collaboration and visualization feature all assist users in communicating outcomes.

Data Modeling and Analysis, Data Science and Big Data
AutoStat will help different users of big data in many different ways. For example, the point and click nature of AutoStat will allow data analysts to perform sophisticated machine learning and produce the standard results by default without needing to implement code, decide on the most appropriate libraries or construct their own visualisations. The Bayesian models provided in AutoStat R are highly optimised and scalable to big data. Default settings have been based on the latest research in the area of each model, are well documented and are prominently displayed so that users are aware of their settings (and can easily change them).

Software, Programming and Data Visualization
AutoStat provides modern graphics using drop and drag, with many customisable styles and the option of layering within charts. Users can produce high quality graphics without the need to code.

Sat, Feb 22
2:00 PM - 4:00 PM
Regency A

T1 - Applied Use of R, GitHub, and Markdown for Reproducible Workflows for Small Data Teams
Tutorial

Instructor(s): Karin Neff, BSD7

Download Handouts

Many organizations have limited personnel and resources available to building efficient data workflows. As organizations grow, having solid documentation of processes, reproducible analyses and systemic collaboration tools are essential for maintaining efficient workflows.

This tutorial will walk through setting up documentation and reproducibility using R, Github and Markdown for emerging data scientists and small data teams. Participants will learn best practices for documentation and collaboration, and essential elements for reproducibility via hand-on training in RStudio.

Following this session, participants will have the tools to return to their organizations ready to build reproducible, documented data workflows.

Outline & Objectives

Students will obtain the following hands-on skills:
1. Foundational understanding of why documentation and reproducibility are important.
2. Setup and installation of required software to build workflows in RStudio, Github and documentation in R-Markdown.
3. Understand the necessary components of reproducibility, including:
a. Identified data sources
b. Clear workflows and timelines
c. Version control and code
4. Understand the necessary components of documentation, including:
a. Metadata
b. Building organizational best-practices
c. The fundamentals of useful commenting
d. Combining narrative, code and documentation
e. Organizational transparency

Following this session, participants will have the tools to return to their organizations ready to build reproducible, documented data workflows.

About the Instructor

Dr. Karin Neff is the Data and Assessment specialist for Bozeman Public Schools where she works in a data-team of one to build data stories to aid in student growth and achievement. Dr. Neff relies heavily on open source tools to maintain analytic integrity and reproducibility in the public sector. Dr. Neff received her doctorate in Ecology and Environmental Sciences from Montana State University where she helped develop laboratory best practices, contributed to documentation strategies and mentored emerging scientists.

Relevance to Conference Goals

This course will provide an opportunity for emerging analysts to establish best practices in reproducibility and documentation that will serve them for their entire careers. It will also provide tools and information for organizations with small data teams to build workflows that will scale as their organizations and analytic needs grow.

Sat, Feb 22
2:00 PM - 4:00 PM
Regency C

T3 - Project Management Principles for Statisticians
Tutorial

Instructor(s): Ana H Valentin, Marymount University

Download Handouts

Project Management Institute (PMI) indicated that: - 58% of organizations fully understand the value of project management - 93% of organizations report using standardized project management practices - 68% of organizations in PMI’s annual survey said that they used outsourced or contract project managers in 2018 - 23% of organizations use standardized project management practices across the entire organization - 33% use standardized practices, but not across all departments - 7% of organizations don't use any standard practices at all

Outline & Objectives

Scope: The goal of this workshop is to demonstrate how to apply the basic principles of the Project Management Institute's Body of Knowledge (PMBOK) the workplace.
Objectives:
? Learn the basic PMBOK templates, such as charter, project plan, budget, risk management, and presentation;
? Understand how to use the basic PMBOK templates using Google drive;
? Draft a charter, project plan, budget, risk management, and presentation on Google.
Benefits:
? Understand the principles of project management based on the PMBOK
? Learn how to apply basic project management tools such as project charter, project management plan, and risk management plan; and
? Draft a presentation for managers.
Level: Basic

Software: Google drive

About the Instructor

Ana Valentín serves as an Enterprise Service Program Manager for the Enterprise Service Branch in the Service Delivery Division under National Oceanic and Atmospheric Administration (NOAA) Office of Chief Information Officer. In this capacity, Ana leads various teams of technology projects strengthen NOAA’s Mission. Ana promotes diversity and inclusion through the Latinos@NOAA Employees Resource Group (ERG) an organization that she co-founded on 2014 and recipient of the 2018 NOAA’s Administrator Award. Ana taught undergraduate statistics and math courses and a graduate clinical research course for six years. Ana also had published research articles and has been presenting at the League United Latin American Citizens Federal Training Institute national conferences professional development workshops. Ana has a BA and MPH from the University of Puerto Rico, a MS from University of Fairfax, and graduate certificates from: George Washington University, University of Maryland University College, and the United States ARMY War College. In her spare time, Ana collaborates with various non-profit, while pursuing a D.Sc. on Cyber-security from Marymount University in Virginia.

Relevance to Conference Goals

Relevance to Conference Goals:
? Better communicate and collaborate with their clients and customers
? Have a positive effect on their organization or enhance their professional

Sat, Feb 22
2:00 PM - 4:00 PM
Regency D

T4 - Introduction to Bayesian Data Analysis
Tutorial

Instructor(s): An-Ting Jhuang, UnitedHealth Group R&D; Christina Phan Knudson, University of St. Thomas

Download Handouts

This short course introduces Bayesian statistics at a level appropriate for all practitioners in both academia and industry. This two-hour course introduces fundamental Bayesian concepts, model creation, diagnostics, and interpretation of results.

Examples and sample code will develop participants’ intuition and practical abilities. Learners will understand the differences between frequentist statistics and Bayesian statistics; explain the importance and use of priors, posteriors and likelihoods; understand the use and function of Markov chain Monte Carlo (MCMC) methods; write R code to create Bayesian models; examine convergence of posterior samples; and integrate results into decision-making.

Participants will implement these skills with several examples using practical models (linear regression and logistic regression) with real-world data sets. This workshop will broaden participants’ skill-sets for solving real-world problems.

Outline & Objectives

1. Intro to Bayesian concepts

2. Examples: coin flip, linear regression, logistic regression

3. Interpreting results

4. Conjugate priors

5. MCMC samplers: Why do we need them? How do they work?

6. MCMC convergence: definition, intuition, diagnostics, R code, packages

7. Larger exampler with survival data

--Basic goal: create a logistic regression to model the log odds of survival based on various predictors (e.g. gender, fare class, adult vs child)

--Intermediate: prediction

--Advanced: evaluate impact of the prior distribution, the Monte Carlo sample size, the inclusion/exclusion of variables

8. Review

Goal: introduce participants to the Bayesian statistical framework. Participants will understand and gain hands-on experience with priors, likelihoods, and posteriors; Markov chain Monte Carlo (MCMC) samplers; MCMC convergence; and the basic Bayesian workflow.

About the Instructor

An-Ting Jhuang holds a PhD in statistics from North Carolina State University. She has developed new Bayesian methods to tackle problems in epidemiology and material science. Her research focuses on sparse signal detection in spatial and spatiotemporal statistics, and exposure assessment. She is a Principal Data Scientist at UnitedHealth Group Research & Development in Minnesota. On a day-to-day basis, she identifies research directions and applies statistical methods to solve scientific and business questions in the health-care field.

Christina Knudson holds a PhD in statistics from the University of Minnesota. She is an assistant professor at the University of St. Thomas in Minnesota. She is the author and maintainer of the R package glmm, which is downloaded from CRAN over 1000 times per month. Her most recent contribution is “Revisiting the Gelman-Rubin Diagnostic” (Vats and Knudson), which stabilizes the Gelman-Rubin (GR) statistic, proposes a principled GR threshold for terminating samplers, and connects effective sample size to the GR statistic. Additionally, she is the organizer of the Twin Cities chapter of R Ladies.

Relevance to Conference Goals

Our goal of jump-starting participants’ Bayesian statistics abilities directly aligns with the conference goal of providing participants with the opportunity to learn new statistical methodologies and best practices in statistical analysis. We have designed this short course to broaden applied statisticians skill sets so that they can better consult with and aid customers and organizations solve real-world problems. Our short course will teach statistical techniques that participants can apply to their jobs as applied statisticians; participants will leave the workshop having practiced with several examples using practical models (linear regression and logistic regression) with real data sets.

Sat, Feb 22
4:00 PM - 4:15 PM
Regency EF

Refreshment Break
Other

Sat, Feb 22
4:15 PM - 5:30 PM
Regency A

GS2 - Closing Session
General Session

Chair(s): David J. Corliss, Peace-Work

The Closing Session is an opportunity for you to interact with the CSP Steering Committee in an open discussion about how the conference went and how it could be improved in future years. CSPSC 2021 vice chair, David J. Corliss, will lead a panel of committee members as they summarize their conference experience. The audience will then be invited to ask questions and provide feedback. The committee highly values suggestions for improvements gathered during this time. Each attendee will have an opportunity to win a door prize.

CSP Themes from Closing Session
View Presentation David J. Corliss, Peace-Work; Mary J Kwasny, Northwestern University

Online Program

American Statistical Association