Back to search menu
Thursday, February 20
Thu, Feb 20
7:00 AM - 6:30 PM
Ballroom Foyer
Registration
Registration
Thu, Feb 20
8:00 AM - 5:30 PM
Regency A
SC1 - The Tlverse Software Ecosystem for Targeted Learning
Short Course (full day)
Instructor(s): Alan Hubbard, University of California, Berkeley; Mark van der Laan, University of California, Berkeley
Download Handouts
This full-day short course will provide a comprehensive introduction to the field of targeted learning and the corresponding tlverse software ecosystem (https://github.com/tlverse). In particular, we will focus on targeted minimum loss-based estimators of causal effects, including those of static, dynamic, optimal dynamic, and stochastic interventions. These multiply robust, efficient plug-in estimators use state-of-the-art, ensemble machine learning tools to flexibly adjust for confounding while yielding valid statistical inference. In addition to discussion, this workshop will incorporate both interactive activities and hands-on, guided R programming exercises, to allow participants the opportunity to familiarize themselves with methodology and tools that will translate to real-world data analysis. It is highly recommended for participants to have an understanding of basic statistical concepts such as confounding, probability distributions, confidence intervals, hypothesis tests, and regression. Advanced knowledge of mathematical statistics may be useful but is not necessary. Familiarity with the R programming language will be essential.
Outline & Objectives
By the end of this course participants should be able to:
1. Discuss the utility of the robust estimation strategy of targeted learning in comparison to conventional techniques, which often rely on restrictive statistical models and may therefore lead to severely biased inference.
2. Utilize the super learner, a loss-function-based tool that uses V-fold cross-validation, to obtain the best prediction of the parameter of interest.
3. Calculate nonparametric variable importance metrics with both the super learner and targeted minimum loss-based estimators.
4. Estimate the causal effect of an intervention under static, dynamic, optimal individualized, and stochastic regimes using the tlverse.
5. Implement targeted minimum loss-based estimators when the outcome is subject to missingness, when mediators are present on the causal pathway, in high dimensions, and in studies with two-phase sampling.
6. Interpret the effect of interest under the real-world scenarios mentioned in learning objectives 4 and 5.
7. Construct novel targeted minimum loss-based estimators to extend the tlverse ecosystem of R packages.
About the Instructor
Mark van der Laan, PhD, is Professor of Biostatistics and Statistics at UC
Berkeley. His research group developed loss-based super learning in
semiparametric models, based on cross-validation, as a generic optimal tool for
the estimation of infinite-dimensional parameters, such as nonparametric density estimation and prediction with censored data. Building on this work, Mark's research group developed targeted minimum loss-based estimation as a general optimal methodology for statistical and causal inference. Recently, his group has worked towards developing a principled set of software tools for targeted learning, the tlverse.
Alan Hubbard, PhD, is Professor of Biostatistics. Research in Alan's group is generally motivated by applied problems in computational biology, epidemiology, and precision medicine.
This short course will also be instructed by Jeremy Coyle, PhD, a consulting data scientist who is leading the software development effort that has produced the tlverse ecosystem of R packages. Since the development of this workshop was a joint effort, the following PhD students in biostatistics will also co-instruct: Nima Hejazi, Ivana Malenica, and Rachael Phillips.
Relevance to Conference Goals
This full-day short course will provide participants with practical knowledge about analyzing data of various forms through the application of targeted learning, a state-of-the-art statistical method. Guided by R programming exercises, case studies, and intuitive explanation; participants will build a toolbox for applying the targeted
learning statistical methodology, which will translate to real-world causal inference and statistical analyses. We will feature a diversity of data, relevant to a broad range of applied statisticians.
The overall objective of this course is to provide training to students, researchers, industry professionals, faculty in science, public health, statistics, and other fields to empower them with the necessary knowledge and skills to utilize the sound methodology of Targeted Learning --- a technique that provides tailored pre-specified machines for answering queries, so that each data analysis is completely reproducible, and estimators are efficient, minimally biased, and provide formal statistical inference. This objective aligns with the conference goals, and thereby we believe that we would be a good fit for a full-day short course.
Thu, Feb 20
8:00 AM - 5:30 PM
Regency B
SC2 - Introduction to R: From Programming to Tidying to Analysis
Short Course (full day)
Instructor(s): Philip D. Waggoner, The University of Chicago
The use of R is rapidly increasing in all corners of data science and empirical research. This is for good reason as R is not only a fast and efficient programming language and environment for doing statistics and data analysis, but it is also free and open source. As such, this course will offer a high-level introduction to the statistical computing language of R from start to finish. We will cover a range of topics in "base R" as well as fold in the “tidy” approach to wrangling and visualization in R. The end result will be a fully equipped researcher/practitioner who can efficiently and effectively move from obtaining a messy, unorganized data set to a polished, presentable final product across a variety of domains and applications.
Outline & Objectives
The goals of the course are to get participants comfortable engaging in basic coding in R, wrangling and cleaning complex data, troubleshooting errors on their own, estimating widely used models, and transforming numerical output into visually pleasing figures. As the course is geared toward beginners, no prior coding experience (in or out of R) is assumed. We will start at the ground level to ensure that everyone is at the same place.
As a rough outline, we will cover:
1. Getting started with R and R Studio // Packages // Basic Programming
2. Loading, cleaning, and wrangling data
3. Statistics: widely-used model fitting, interpretation, diagnostics (T-tests, OLS, Binary Response and Count models)
4. Data Visualization: in Base R and the Tidyverse
5. (If time) Advanced Topics: Basic Webscraping and Text Analysis (preprocessing and wordclouds)
The goal is for a high level introduction to the practical use of R for a host of applications and fields. Thus, we start at the ground level and no prerequisites or prior coding experience is necessary. Some level of basic applied statistics would be useful (but not required) to fully understanding the model fitting portion.
About the Instructor
I have been using R professionally for many years, and incorporated in my Ph.D. dissertation. Further, I have taught a semester-version of this course to Master of Public Policy students at the College of William & Mary. Further, I have written and coauthored many R packages of my own, as well as I am a member of "easystats" which is a software development group focused on writing packages to make statistics in R easy (https://github.com/orgs/easystats/people). Further, a colleague (Ryan Kennedy, University of Houston) and I are writing a book on introducing the Tidyverse version of R to the social science community. I already have scripts and many example datasets, as well as "worksheets" (.Rmd files) prepared for all units. These are available at my Github: https://github.com/pdwaggoner/Intro-to-R . Thus, I am prepared, experienced, and eager to present a high-level introduction to R to non-users or those wanting to widen their scope of statistical programming a bit more.
Relevance to Conference Goals
1. Learn statistical methods or programming techniques that apply to their job as applied statisticians: For this first goal, as this course is geared towards beginners, the assumption is that those who sign up will be eager to learn new techniques, which I will teach from start to finish. Further, I will give students sample data and R scripts for all topics so they can use adapt and extend these concepts in the future for their own reasons.
2. Better communicate and collaborate with their clients and customers: By learning these techniques, as well as how they fit into a broader framework of a consolidated research project, users will avoid the "piecemeal"/self-taught route of learning R which inevitably produces gaps in understanding. Instead, by taking this class, students will learn how all of these pieces (from wrangling to programming to fitting models and visualizing results) fit together and thus how they can best present information to interested parties.
3. Have a positive effect on their organization or enhance their professional development: The previous two goals being met, this third goal is a natural byproduct, where learning more == empowerment == excitement!
Thu, Feb 20
8:00 AM - 5:30 PM
Golden State
SC3 - Hands-On Introduction to Python in Data Science
Short Course (full day)
Instructor(s): Mei Najim, Advanced Analytics Consulting Services, LLC
Download Handouts
This course is designed to provide a hands-on introduction to Python, the well- known open-source programming language for data science including predictive modeling and data analysis. A case study using insurance data is employed in order to methodically expose attendees to data science best practices and hands-on experience in Python. Sample data and Python coding are provided.
Outline & Objectives
Outline:
(1) Learn how Jupyter Notebooks work, and cover the basics of programming including data structures, data operations, if else statements, for and while loops, and logical operations, etc.
(2) An in-depth Predictive Analytics Case Study in Insurance
Learning Objectives: Get some hands-on experience in Python
(1) Learn how to explore and prepare data in Python
(2) Use a variety of statistical methods and machine learn algorithms: GLM, decision trees and random forests, neural nets to build predictive models in Python.
Audiences: Statisticians, such as manufacturing, pharmaceutical, banking and government agencies; Statistical researchers/analysts in universities; Graduate students in statistics departments.
Prerequisites: BS/MS level education in statistics or mathematics with some programming experience; Install Jupyter Notebooks.
About the Instructor
Mrs. Mei Najim provides advanced analytics consulting services to the Property & Casualty insurance industry mainly in Strategic Planning (Developing advanced analytics strategic short-term and long-term plans for the organization) and Advanced Analytics Capability Building (Developing full life cycle analytics processes from raw data exploration to analytics solutions implementation into IT data systems). Mei has 15 years hands-on big data advanced analytics experience including statistical methods, machine learning algorithms, and data mining in the Property & Casualty insurance industry. She also has experience in catastrophic modeling, actuarial pricing, reserving, and R&D. Mei has frequently presented at conferences to share and further develop her expertise. Mei holds a BS degree in Actuarial Science from Hunan University and two MS degrees, one in Applied Mathematics and the other in Statistics, from Washington State University. Mei is a member of the American Statistical Association and a Certified Specialist in Predictive Analytics (CSPA) of the Casualty of Actuary.
Relevance to Conference Goals
The objective is to provide attendees with hands-on experience about data science, modeling, and analyzing data of various forms through the application of state-of-the-art statistical methods and machine learning algorithms in Python.
Thu, Feb 20
8:00 AM - 12:00 PM
Regency C
SC4 - Side-by-Side Learning of R and Python by Analyzing Big Longitudinal Data
Short Course (half day)
Instructor(s): Mohammed Rahim Uddin Chowdhury, Kennesaw State University
R and Python are two highly used open-source interpreted programming languages with a large and diverse community. Due to the open-source nature, new libraries are developed and added continuously to their respective catalog for researchers when new Mathematical, Statistical or other models are discovered. R has more than 12000 packages available in CRAN (open-source repository), which researchers can use to perform whatever analysis they need. The rich variety of library makes R the first choice for statistical analysis, especially for specialized analytical work. On the other hand, Python does not have that many packages for data analysis and data modeling. Most of the data science job can be done with five Python libraries: Numpy, Pandas, Scipy, Scikit-learn and Seaborn. However, it is known to the scientific community that Python is catching up R by rapidly developing packages for data mining and statistical modeling. In this short course at CSP 2020, I will show in details the side by side comparisons between R and Python on six topics such as data mining and data analysis, test of hypothesis, correlation and regression, simulation, mathematical computations, text mining.
Outline & Objectives
The outline of the short course is to discuss the application of R and Python on the problems of
1. Data mining and data analysis (consists of 50 different data mining problems)
2. Test of Hypotheses and confidence interval (consists of 20 different problems)
3. Regression models (16 different models will be discussed)
4. Simulations (9 different simulation design will be discussed)
5. Mathematical Computations (50 different problems will be computed)
6. Text mining (Word cloud, sentimental analysis, and most graphs for frequently used word will be discussed)
The objective of this short course is to train participants on how to use R and Python simultaneously in solving problems from above mentioned topics for their professional works. The instructor of the short course does not require that the participants should have prior knowledge of using R and Python. The instructor will provide all the problems in easily understandable questions format together with R and Python programming code. First, the instructor will discuss the problems, and then he will run the R and Python code together with the participants.
About the Instructor
I obtained my PhD degree in Statistics in 2013, and working as a tenure track Assistant Professor of Statistics in the Department of Statistics and Analytical Science at Kennesaw State University since August 2015. During my four years at KSU, I have taught altogether ten unique undergraduate and graduate courses, which is more than two new courses per year. Five courses are undergraduate courses and they are as varied as introductory statistics courses up to R and Python programming. I was motivated to teach python programming as it has a high and growing demand in industry, and many employers want data engineer with expertise in python. Five other courses are graduate courses. I taught a theoretical and computation Bayesian Statistics special topic course for graduate students. R programming language was used to teach computational parts such as EM algorithm, MCMC, Gibbs sampling, Metropolis algorithm, and Metropolis-Hasting algorithm. Another graduate course is Applied Time Series Analysis. For teaching most courses, I always prefer R programming language. I taught the undergraduate R programming course in Fall 2018. In Spring 2019, I am taught Python Programming course.
Relevance to Conference Goals
‘Conference on Statistical Practice’ is usually considered a platform for applied researchers, who use novel statistical and machine learning methods to solve data driven problems. To solve data driven problem, R and Python have built in packages to use. This short course will introduce both R and Python to analyze a big longitudinal data. In additional various simulation designs and text mining will be discussed in this course. This course will help any person interested to learn R and Python from the scratch.
Thu, Feb 20
8:00 AM - 12:00 PM
Regency D
SC5 - Essential Collaboration: The ASCCR Frame
Short Course (half day)
Instructor(s): Heather Smith, Cal Poly; Eric Vance, LISA-University of Colorado Boulder
Download Handouts
Statisticians and data scientists often collaborate with domain experts from many different fields in academia, business, and government. Learning more effective collaboration skills will enable us to maximize our professional impact in these areas. In this short course, participants will learn and practice essential skills that will enable them to improve their collaborations and add more value to their projects, customers, and organizations. We introduce the ASCCR framework that describes our current best practices for five aspects of statistical consulting and collaboration (Attitude-Structure-Content-Communication-Relationship). Specifically, participants will learn how to establish foundational collaborative Attitudes, implement the POWER Structure for conducting effective meetings, apply the Q1Q2Q3 approach to consultations and collaborations, Communicate more effectively, and adopt practical strategies to strengthen Relationships. Participants will practice these skills via team exercises, role-plays, video coaching, and individual reflections to become more effective collaborators, allowing them to have greater impact in their roles as statisticians and data scientists.
Outline & Objectives
Our objective is to introduce key concepts that will help participants improve their collaboration skills so they can return to key roles within their organizations and achieve greater impact. This short course will be useful for all levels from beginning to advanced. Prerequisites are a desire to improve one’s personal effectiveness and openness to try new methods and ways of thinking in the practice of statistics and data science.
1 Welcome and warm-up team exercises
2 Introduction to ASCCR Frame
3 Attitude of effective collaboration (participants complete Attitude checklist)
4 POWER structure (Prepare-Open-Work-End-Reflect) and why we believe this structure produces effective meetings
5 Best practices for opening meetings (Eric and Heather mock role play, video review, then participants role play)
6 Best practices for ending meetings (Eric and Heather mock role play, video review)
Break
7 Q1Q2Q3 approach to the Content of statistical projects (reflection exercise)
8 Triangle of Statistical Communication (team discussion)
9 Tips for strengthening Relationships (reflection exercise)
10 Overall written reflection and individual plan for improving collaboration skills.
About the Instructor
For the past 11 years, Dr. Eric Vance, an Associate Professor at the University of Colorado Boulder, has been the director of LISA (Laboratory for Interdisciplinary Statistical Analysis) where he has trained 271 statisticians to move between theory and practice to collaborate with 9500+ domain experts to apply statistics and data science to answer their research or business questions. He has taught workshops and webinars on collaboration in nine countries around the world, including several in collaboration with Heather Smith.
Heather Smith has 28 years of experience consulting with academic, industrial, service, and government clients in the United States, Europe, and Asia. She began this work as a statistical consultant at Westat, Inc. For 21 years she has been a faculty member in the Statistics Department at Cal Poly San Luis Obispo where she consults with academic and private sector researchers and teaches a wide variety of applied statistics courses, including courses in statistical communication and consulting. She has offered over a dozen workshops, short courses, and webinars on these topics, and has trained hundreds of statistical collaborators.
Relevance to Conference Goals
This short course is relevant for all three of the three main conference goals. Participants will learn new skills and practical tips to apply whenever they interact with another person in their job as an applied statistician. Participants will explicitly learn how to better communicate and collaborate with their clients and customers. Skills learned in the course will equip participants to have a positive impact on their organization and an upward career trajectory. Participants will return to their jobs with new ideas, techniques, and strategies to improve their ability to communicate and collaborate effectively, resulting in a greater impact on their organizations and increasing the overall impact of statistics and data science in the world at large.
A version of this course was taught at the 2018 CSP and received a high average rating of 4.63 out of 5 (n=8 responding out of 22 participants). The official qualitative feedback we received: “This course is essential for any statistician who needs to collaborate with people in other disciplines, or sell their business to clients. I very strongly recommend it.” Unofficial feedback was very positive as well.
Thu, Feb 20
1:30 PM - 5:30 PM
Regency C
SC6 - Increasing Business Impact Through Automated Reporting in R
Short Course (half day)
Effective communication of results is among the essential duties of the industrial statistician, but the sometimes tedious mechanics of report production together with the sheer volume of data that many statisticians now must process combine to make reporting design an afterthought in too many cases. In this half-day course, we review recent advances in automated report production that liberate resources for statisticians to focus on the interpretation and communication of results, while simultaneously reducing errors and increasing consistency of analyses. We teach the course through an extended example, cumulatively building an R script that takes participates from receipt of an example dataset to a beautifully-designed and nearly completed PowerPoint presentation automatically and using freely available, open-source packages. Details of how to customize the final presentation to incorporate corporate branding - such as logos, font choices, and color palettes - will also be covered.
Level: We recommend a minimal level of experience using R, RStudio, and the tidyverse.
Outline & Objectives
With this half-day course, we help industrial statisticians increase their business impact by leveraging tools for automated report production in R.
Topics covered include:
* What does automated reporting mean in practice?
* Scripting analyses, tables, and charts
* Automated production of PowerPoint presentations
* Building a "cookbook" of reporting recipes
* Font choices and color palettes
* Layering storytelling onto an automated report
About the Instructor
Dr. John Ennis is president of Aigora (www.aigora.com), a consulting and coaching organization dedicated to helping market researchers prepare for the rise of artificial intelligence. As part of this preparation, Aigora provides instruction in the automation of standard work practices, including report preparation. Dr. Ennis, a Ph.D. mathematician who conducted his postdoctoral training in computational neuroscience, has 11+ years of market research consulting experience, has presented at JSM and CSP, and will have presented at SDSS by the time of CSP 2020. In addition, Dr. Ennis is the author of over 30 peer-reviewed publications and two books on quantitative market research topics. Earlier this year, Dr. Ennis branched out from the Institute for Perception to found Aigora - in his prior work, Dr. Ennis was a well-reviewed instructor at dozens of short courses covering quantitative market research, including instruction on topics within data science. In his professional work, Dr. Ennis has used tools for automated reporting for approximately five years, and he now teaches such tools to his clients operating within a variety of enterprise-level businesses.
Relevance to Conference Goals
Through participation in this course, attendees will learn to support their internal clients with well-designed and easy-to-read reports they prepare quickly and can continually improve over time, building their credibility and influence within their organizations.
Thu, Feb 20
1:30 PM - 5:30 PM
Regency D
SC7 - Building LaTeX Templates for R Markdown to Produce Branded PDF Reports
Short Course (half day)
Instructor(s): Ben Barnard, Wells Fargo
Branded reports give a clean, clear and consistent message for data science teams in an organization. We walk through the process of building a latex template distributed through an R package. We begin with a short introduction to rmarkdown and some motivating examples for using branded reports. Then, we demonstrate from scratch how one can build a minimal latex template, and distribute in a R package. We describe some best practices for branding and highlight use of ggplot2 themes to match document branding. Finally, we walk through some further uses such as parameterized reports, using the template for bookdown, and recommendation for deploying the R package at your company.
Outline & Objectives
The student should be able to walk away from this class with:
1. a general understanding of rmarkdown,
2. why it is important to have branded reports,
3. a R package with a latex template that uses their companies branding,
4. understanding of best practices in branding,
5. use of ggplot2 themes,
6 and some possible further uses for the using and distributing the template.
About the Instructor
Ben Barnard is a Data Scientist at Wells Fargo in the Team Member Insights group. Ben has a PhD from Baylor University in Statistics.
Jeff Idle is an Analytic Manager at Wells Fargo in the Team Member Insights group. Jeff leads the HR Advanced Analytics & Architecture team. Jeff is currently pursuing a MBA from the University of Minnesota's Carlson School of Management.
Relevance to Conference Goals
We stress using branded reports to communicate clean, clear and consistent messages to your audience. Communication is the most important part of Data Science since decision makers are rarely analytic experts. Branded reports bring a certain professionalism that will be greatly appreciated by administration. Building the latex templates saves time and makes sure every report comes out looking the same. Consistently branded reports allows your team to be recognized immediately by your work product.
Thu, Feb 20
5:30 PM - 7:00 PM
Regency EF
PS1 - Poster Session 1 and Opening Mixer
Poster Session
Chair(s): Alek Kotolyan, dot818
1
2
3
4
5
6
7
8
9
11
12
13
14
15
16
17
18
Thu, Feb 20
5:30 PM - 7:00 PM
Regency EF
Exhibits Open
Exhibits
Friday, February 21
Fri, Feb 21
7:30 AM - 5:30 PM
Ballroom Foyer
Registration
Registration
Fri, Feb 21
7:30 AM - 8:30 AM
Regency EF
Continental Breakfast
Other
Fri, Feb 21
7:30 AM - 6:30 PM
Regency EF
Exhibits Open
Exhibits
Fri, Feb 21
8:00 AM - 9:00 AM
Regency BC
GS1 - Keynote Address
General Session
Fri, Feb 21
9:15 AM - 10:45 AM
Regency C
CS01 - Fit Data, Fit Analysis
Concurrent Session
Chair(s): Megan Elyse Lutz, University of Georgia
Fri, Feb 21
9:15 AM - 10:45 AM
Regency B
CS02 - Feature Identification in Complex Multivariate Systems
Concurrent Session
Chair(s): Cheryl Vanier, Touro University Nevada
Fri, Feb 21
9:15 AM - 9:50 AM
Regency A
CS03 - The Birds and the ps
Concurrent Session
Chair(s): Chris Barker, Statistical Planning and Analysis Services, Inc.
Fri, Feb 21
9:15 AM - 10:45 AM
Regency D
CS04 - Pipeline and Parallel Computing Using R
Concurrent Session
Chair(s): Frost Hubbard, Westat
Fri, Feb 21
10:00 AM - 12:30 PM
Regency A
CS05 - Adventuring Beyond P < 0.05
Concurrent Session
Chair(s): Zach Weller, Colorado State University
10:05 AM
Adventuring Beyond P < 0.05
View Presentation
Karen Grace-Martin, The Analysis Factor; Tom Gwise, FDA; Megan Higgs, Independent consultant; Dan Jeske, UC-Riverside; Ruixiao Lu, Genomic Health; Wendy L. Martinez, Bureau of Labor Statistics
Fri, Feb 21
10:45 AM - 11:00 AM
Regency EF
Refreshment Break
Other
Fri, Feb 21
11:00 AM - 12:30 PM
Regency B
CS06 - Adventures in Regression
Concurrent Session
Chair(s): Qiao Ma, Google
Fri, Feb 21
11:00 AM - 12:30 PM
Regency C
CS07 - Mining with Machine Learning
Concurrent Session
Chair(s): Lazarus K Mramba, University of Kansas Medical Center
Fri, Feb 21
11:00 AM - 12:30 PM
Regency D
CS08 - Making a Difference in the Real World? Applications of Meta-Analysis
Concurrent Session
Chair(s): Grant Innerst, Shippensburg University
Fri, Feb 21
12:30 PM - 2:00 PM
Lunch (On Own)
Other
Fri, Feb 21
2:00 PM - 3:30 PM
Regency A
CS09 - Leading with Statistics
Concurrent Session
Chair(s): Jeffrey C. Farmer, New Orleans Baptist Theological Seminary
Fri, Feb 21
2:00 PM - 3:30 PM
Regency B
CS10 - Interval Estimation
Concurrent Session
Chair(s): Melissa Innerst, Juniata College
Fri, Feb 21
2:00 PM - 3:30 PM
Regency C
CS11 - Big Data - Big Problems
Concurrent Session
Chair(s): Sumihiro Suzuki, UNT Health Science Center
Fri, Feb 21
2:00 PM - 3:30 PM
Regency D
CS12 - Toward Automation: Safety Studies and Dose-Finding Designs
Concurrent Session
Chair(s): Kim Love, K. R. Love Quantitative Consulting and Collaboration
Fri, Feb 21
3:30 PM - 3:45 PM
Regency EF
Refreshment Break
Other
Fri, Feb 21
3:45 PM - 5:15 PM
Regency A
CS13 - Statistics in a Modern World
Concurrent Session
Chair(s): Thor D. Osborn, Sandia National Laboratories
Fri, Feb 21
3:45 PM - 5:15 PM
Regency B
CS14 - Communication with ADEPT and Methods for Sparse Data
Concurrent Session
Chair(s): Craig N. Refugio, Negros Oriental State University, Philippines
Fri, Feb 21
3:45 PM - 5:15 PM
Regency C
CS15 - CSP Themes. Is it time to refocus? An interactive panel
Concurrent Session
Fri, Feb 21
3:45 PM - 5:15 PM
Regency D
CS16 - Data Visualization and Output for Reporting
Concurrent Session
Chair(s): Darius Singpurwalla, National Center for Science and Engineering Statistics
Fri, Feb 21
5:15 PM - 6:30 PM
Regency EF
PS2 - Poster Session 2 and Refreshments
Poster Session
Chair(s): Alok Kumar Dwivedi, Texas Tech University
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Saturday, February 22
Sat, Feb 22
7:30 AM - 2:30 PM
Ballroom Foyer
Registration
Registration
Sat, Feb 22
7:30 AM - 1:00 PM
Regency EF
Exhibits Open
Exhibits
Sat, Feb 22
8:00 AM - 9:15 AM
Regency EF
PS3 - Poster Session 3 and Continental Breakfast
Poster Session
Chair(s): Sudeshna Paul, Emory University
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Sat, Feb 22
9:15 AM - 10:45 AM
Regency A
CS17 - Essential Collaboration Skills
Concurrent Session
Chair(s): Paul Berg, Eli Lilly & Company
Sat, Feb 22
9:15 AM - 10:45 AM
Regency B
CS18 - Statistical Methods in Health Care
Concurrent Session
Chair(s): Mohammed Rahim Uddin Chowdhury, Kennesaw State University
Sat, Feb 22
9:15 AM - 10:45 AM
Regency C
CS19 - Taxonomy Stories: Human-Centered Classification
Concurrent Session
Chair(s): Joshua Lambert, University of Cincinnati
Sat, Feb 22
9:15 AM - 10:45 AM
Regency D
CS20 - Ethics Panel: Ethical Practices at the Intersection of Statistics and Public Service
Concurrent Session
Chair(s): David J. Corliss, Peace-Work
The focus for the Ethics Panel this year is the intersection of statistics and public service. This area leads to ethics questions on privacy of data in the public sphere, advising on ethical best practices for public sector agencies, financial support from industry for statistical testing e.g., pharma approvals, and working to do the best science in an increasingly politicized and polarized world.
Panelists:
Daniel Elchert, ASA Policy Fellow
Wendy Martinez, Bureau of Labor Statistics
Darius Singpurwalla, NSF/ National Center for Science and Engineering Statistics
Sat, Feb 22
10:45 AM - 11:00 AM
Regency EF
Refreshment Break
Other
Sat, Feb 22
11:00 AM - 12:30 PM
Regency A
CS21 - Going Public
Concurrent Session
Chair(s): Jay Mandrekar, Division of Biomedical Statistics and Informatics, Mayo Clinic
Sat, Feb 22
11:00 AM - 12:30 PM
Regency B
CS22 - Markov Models
Concurrent Session
Chair(s): Steven B. Cohen, RTI International
Sat, Feb 22
11:00 AM - 12:30 PM
Regency D
CS23 - Real-World Applications
Concurrent Session
Chair(s): Michelle Sarah Livings, University of Southern California
Sat, Feb 22
11:00 AM - 12:30 PM
Regency C
CS24 - Policy and Support for Practicing Statisticians
Concurrent Session
Chair(s): Ron Gangnon, University of Wisconsin-Madison
Sat, Feb 22
12:30 PM - 2:00 PM
Lunch (on own)
Other
Sat, Feb 22
2:00 PM - 4:00 PM
Regency B
PCD1 - Meta-Analysis Using Stata
Practical Computing Demo
Instructor(s): Houssein Assaad, StataCorp LLC
Organizer(s): Brooke Erchinger, StataCorp LLC
This workshop will cover the use of Stata to perform meta-analysis (MA), a statistical technique for combining the results from several similar studies. The course will provide a brief introduction to MA and will demonstrate how to perform MA in Stata 16. Stata’s new meta command offers full support for MA—from computing various effect sizes and producing basic meta-analytic summary and forest plots to accounting for between-study heterogeneity and potential publication bias. A number of case studies demonstrating how to conduct an MA within Stata will be provided. These examples will focus on the interpretation of MA under various models, meta-regression and its postestimation features, subgroup analysis, small-study effect and publication bias, and various types of forest, funnel, and other plots. No prior knowledge of Stata is required, but basic familiarity with MA will prove useful.
Outline & Objectives
Outline
This workshop is geared toward researchers wanting to perform MA and those who already
know about MA and wish to learn how to do it using Stata.
1. Brief overview of MA
2. Data setup and effect sizes using meta set and meta esize
• Effect sizes for binary data
• Effect sizes for continuous data
• Generic (precomputed) effect sizes
3. MA models
• Random-effects model (seven estimation methods)
• Fixed-effects model (Mantel–Haenszel and inverse-variance methods)
• Common-effect model (Mantel–Haenszel and inverse-variance methods)
4. Graphical and numerical MA summary using meta summarize and meta forestplot
• Standard MA
• Subgroup MA with one or many grouping variables
• Cumulative MA with and without stratification
1
5. Meta-regression
• Continuous and categorical moderators
• Fixed-effects and random-effects regression
• Multiplicative and additive residual heterogeneity
• Knapp–Hartung standard-error adjustment
• Postestimation features: prediction, bubble plots, etc.
6. Small-study effects and publication bias
• Standard and contour-enhanced funnel plots
• Traditional and random-effects versions of tests for funnel-plot asymmetry or
small-study effects
• Nonparametric trim-and-fill method
Performance objectives
Participants of this workshop will walk away with the following knowledge:
• A brief overview of MA as a statistical procedure
• How to declare and compute effect sizes using meta set and meta esize
• How to summarize the meta-analytic results via meta sumamrize and meta forestplot
• How to interpret the results under different MA models
• How to address the problem of heterogeneity
• How to perform meta-regression using meta regress
• How to assess the validity of the MA against the threat of publication bias
• How to test for funnel-plot asymmetry using meta bias
• How to conduct a trim-and-fill analysis using meta trimfill
• How to differentiate between various reasons behind funnel-plot asymmetry
This presentation will provide methods and formulas and demonstrate how to perform MA
with real data. Participants who bring their own laptop will be able to interactively follow
along provided they have Stata 16 installed and a working Internet connection for down-
loading datasets from http://www.stata-press.com. However, interactive participation is
not required. The notes will provide sufficient information to reproduce all analyses at the
attendees’ convenience.
About the Instructor
Houssein Assaad is a Senior Statistician and Software Developer at StataCorp LLC and
the primary developer of Stata’s MA suite. Houssein has a PhD in statistics from the
University of Texas at Dallas. He is a former research assistant professor at Texas A&M
University, where his research focused on longitudinal and functional data analysis.
Relevance to Conference Goals
This demonstration will provide researchers with the tools to use MA in real-world applica-
tions. Participants will learn about MA as a statistical procedure and how to perform the
steps of MA in Stata.
Sat, Feb 22
2:00 PM - 4:00 PM
Carmel AB
PCD2 - Introducing the SAS BGLIMM Procedure for Bayesian Generalized Linear Mixed Models
Practical Computing Demo
Instructor(s): Amy Shi, SAS Institute, Inc.
Organizer(s): Fang Chen, SAS Institute, Inc.
SAS/STAT® 15.1 includes PROC BGLIMM, a new, high-performance, sampling-based procedure that provides full Bayesian inference for generalized linear mixed models (GLMMs). PROC BGLIMM models data from the exponential family distributions that have correlations or nonconstant variability; uses syntax similar to that of the MIXED and GLIMMIX procedures (the CLASS, MODEL, RANDOM, REPEATED, and ESTIMATE statements); deploys optimal sampling algorithms that are parallelized for performance; handles multilevel nested and non-nested random-effects models; and fits models to multivariate or longitudinal data with repeated measurements. PROC BGLIMM provides convenient access, with improved performance, to Bayesian analysis of complex mixed models that you could previously perform with the MCMC procedure. This workshop starts with a general discussion of Bayesian GLMM, then presents the important features of PROC BGLIMM, showing you how to use it for estimation, inference, and prediction.
Outline & Objectives
OUTLINE
1. Overview of Bayesian GLMM
2. Syntax and options of PROC BGLIMM
3. Demonstration of PROC BGLIMM through examples
3.1 Simple normal regression
3.2 Logistic regression with random intercepts
3.3 Normal regression with repeated measurements
3.4 Non-nested logistic random-effects model with prediction
3.5 Poisson regression with random effects
3.6 Repeated growth measurements with internal difference
Target Audience
This presentation is intended for a broad audience of statisticians who are interested in Bayesian inference for generalized linear mixed models. It would be helpful for attendees to have a basic understanding of normal regression analysis, generalized linear mixed models, and Bayesian methods, but it is not required.
LEARNING OUTCOMES
(a) Performance objectives
By attending this presentation, participants will improve their knowledge of generalized linear mixed models and Bayesian methods, and they will be able to use the BGLIMM procedure in SAS/STAT software to conduct Bayesian analyses.
(b) Content and instructional methods
The presentation will alternate between the use of slides and software demonstrations. Handouts given to attendees will cover both.
About the Instructor
Amy Shi is a senior research statistician developer in the Advanced Analytics Division at SAS Institute Inc. She received a Ph.D. in biostatistics from the University of North Carolina at Chapel Hill. She joined SAS in 2010, and her work involves implementation of Bayesian methods in software. Amy’s main responsibility is developing and enhancing SAS’ Bayesian capability, with a focus on generalized linear mixed models, discrete choice models, and multilevel hierarchical settings.
Relevance to Conference Goals
Sat, Feb 22
2:00 PM - 4:00 PM
Big Sur AB
PCD3 - AutoStat: A Single Application for Visualization, Data Querying, and Analytics Encompassing AI, Machine Learning, and Statistics
Practical Computing Demo
Organizer(s): Clair Alston-Knox, Predictive Analytics Group
Data is abundant in modern society, and a raft of statistical and machine learning algorithms have been developed to assist researchers, managers and lay-people to understand what inferences can be made from their data, and what decisions would best progress their goal. And yet, the current \p-value crisis in science" is evidence that even in the scientific community, access to these sophisticated algorithms is not owing through to many researchers, particularly those who do not have dedicated statistical or data science support.
The AutoStat Institute was founded by a group of academics and consultants who believe that this issue is, in a large part, due to the need to code in programs like R or Python to gain access to these algorithms. While the operability of these and similar platforms continues to improve, there are many potential users of data that will never have the skill set, time, interest or level of exposure required to become au fait with these packages. As a result, many users are excluded from realising the potential of the Big Data World by virtue of a coding barrier. AutoStat solves this problem by offering its users a modern feel GUI environment for sophisticated statistical analysis that aims to provide academics, students, business and interested people access to scalable modern algorithms and visualizations in a code free environment.
Outline & Objectives
This 2-hour workshop will focus on the user experience and provide practical demonstrations of both businessand research projects from the practical implementations of
Data management: Making new variables with the calculator tool, various methods for easy
data splitting test / train, merging datasets and much more.
Visualisations: Easy exploratory plots through to sophisticated layering approaches for publication and presentation quality output.
Model Building: A range of machine learning and statistical models (both frequentist and
Bayesian approaches)
Results and Inference: Standard outputs and tools to create users own inference metrics
Team work: Project sharing and collaboration from early stage data management, modeling
and report writing,
Tutorials and other help facilities to enable the user to get full benefit from their data analysis.
We will then illustrate how the software can enhance the research or business output using real case studies and implementing the following tools:
Pipeline construction for ease of updating results as new data becomes available via easy point,
click and record;
Dashboard building for effective deployment to end users and broadening the reach of your
research;
Document builders that are available in AutoStat with a range of templates that can be
customized by the user.
About the Instructor
Dr Clair Alston-Knox is a Senior Statistician with Predictive Analytics Group (Melbourne Australia). She had been an research and academic statistician since 1992, with a number of biometric and statistical consulting positions in government and universities. She joined Predictive Analytics and the AutoStat Institute in 2018 because her teaching, consulting, advising and ethics committee roles were frequently frustrated by researchers who were very capable of understanding the objective and benefits of statistical or machine learning approaches, but did not have the resources to learn the required platform to enable next level analysis.
Dr Theo Gazos is the Managing Director of Predictive Analytics Group. Theo has over 25 years of experience building economic and econometric models that isolate and quantify the impact of changing market dynamics (domestic and international), competition effects and government policy on private and government sector organisations. Theo is passionate about bringing the power of statistics and machine learning to all levels within organisations, and has used his years of experience to develop an interface and user ow within AutoStat R that makes this objective achievable.
Relevance to Conference Goals
Communication, collaboration and career development
AutoStat is an ideal environment for sophisticated statistical analysis, such as Bayesian models with stochastic search variable selection. The report building, collaboration and visualization feature all assist users in communicating outcomes.
Data Modeling and Analysis, Data Science and Big Data
AutoStat will help different users of big data in many different ways. For example, the point and click nature of AutoStat will allow data analysts to perform sophisticated machine learning and produce the standard results by default without needing to implement code, decide on the most appropriate libraries or construct their own visualisations. The Bayesian models provided in AutoStat R are highly optimised and scalable to big data. Default settings have been based on the latest research in the area of each model, are well documented and are prominently displayed so that users are aware of their settings (and can easily change them).
Software, Programming and Data Visualization
AutoStat provides modern graphics using drop and drag, with many customisable styles and the option of layering within charts. Users can produce high quality graphics without the need to code.
Sat, Feb 22
2:00 PM - 4:00 PM
Regency A
T1 - Applied Use of R, GitHub, and Markdown for Reproducible Workflows for Small Data Teams
Tutorial
Many organizations have limited personnel and resources available to building efficient data workflows. As organizations grow, having solid documentation of processes, reproducible analyses and systemic collaboration tools are essential for maintaining efficient workflows.
This tutorial will walk through setting up documentation and reproducibility using R, Github and Markdown for emerging data scientists and small data teams. Participants will learn best practices for documentation and collaboration, and essential elements for reproducibility via hand-on training in RStudio.
Following this session, participants will have the tools to return to their organizations ready to build reproducible, documented data workflows.
Outline & Objectives
Students will obtain the following hands-on skills:
1. Foundational understanding of why documentation and reproducibility are important.
2. Setup and installation of required software to build workflows in RStudio, Github and documentation in R-Markdown.
3. Understand the necessary components of reproducibility, including:
a. Identified data sources
b. Clear workflows and timelines
c. Version control and code
4. Understand the necessary components of documentation, including:
a. Metadata
b. Building organizational best-practices
c. The fundamentals of useful commenting
d. Combining narrative, code and documentation
e. Organizational transparency
Following this session, participants will have the tools to return to their organizations ready to build reproducible, documented data workflows.
About the Instructor
Dr. Karin Neff is the Data and Assessment specialist for Bozeman Public Schools where she works in a data-team of one to build data stories to aid in student growth and achievement. Dr. Neff relies heavily on open source tools to maintain analytic integrity and reproducibility in the public sector. Dr. Neff received her doctorate in Ecology and Environmental Sciences from Montana State University where she helped develop laboratory best practices, contributed to documentation strategies and mentored emerging scientists.
Relevance to Conference Goals
This course will provide an opportunity for emerging analysts to establish best practices in reproducibility and documentation that will serve them for their entire careers. It will also provide tools and information for organizations with small data teams to build workflows that will scale as their organizations and analytic needs grow.
Sat, Feb 22
2:00 PM - 4:00 PM
Regency C
T3 - Project Management Principles for Statisticians
Tutorial
Project Management Institute (PMI) indicated that: - 58% of organizations fully understand the value of project management - 93% of organizations report using standardized project management practices - 68% of organizations in PMI’s annual survey said that they used outsourced or contract project managers in 2018 - 23% of organizations use standardized project management practices across the entire organization - 33% use standardized practices, but not across all departments - 7% of organizations don't use any standard practices at all
Outline & Objectives
Scope: The goal of this workshop is to demonstrate how to apply the basic principles of the Project Management Institute's Body of Knowledge (PMBOK) the workplace.
Objectives:
? Learn the basic PMBOK templates, such as charter, project plan, budget, risk management, and presentation;
? Understand how to use the basic PMBOK templates using Google drive;
? Draft a charter, project plan, budget, risk management, and presentation on Google.
Benefits:
? Understand the principles of project management based on the PMBOK
? Learn how to apply basic project management tools such as project charter, project management plan, and risk management plan; and
? Draft a presentation for managers.
Level: Basic
Software: Google drive
About the Instructor
Ana Valentín serves as an Enterprise Service Program Manager for the Enterprise Service Branch in the Service Delivery Division under National Oceanic and Atmospheric Administration (NOAA) Office of Chief Information Officer. In this capacity, Ana leads various teams of technology projects strengthen NOAA’s Mission. Ana promotes diversity and inclusion through the Latinos@NOAA Employees Resource Group (ERG) an organization that she co-founded on 2014 and recipient of the 2018 NOAA’s Administrator Award. Ana taught undergraduate statistics and math courses and a graduate clinical research course for six years. Ana also had published research articles and has been presenting at the League United Latin American Citizens Federal Training Institute national conferences professional development workshops. Ana has a BA and MPH from the University of Puerto Rico, a MS from University of Fairfax, and graduate certificates from: George Washington University, University of Maryland University College, and the United States ARMY War College. In her spare time, Ana collaborates with various non-profit, while pursuing a D.Sc. on Cyber-security from Marymount University in Virginia.
Relevance to Conference Goals
Relevance to Conference Goals:
? Better communicate and collaborate with their clients and customers
? Have a positive effect on their organization or enhance their professional
Sat, Feb 22
2:00 PM - 4:00 PM
Regency D
T4 - Introduction to Bayesian Data Analysis
Tutorial
Instructor(s): An-Ting Jhuang, UnitedHealth Group R&D; Christina Phan Knudson, University of St. Thomas
Download Handouts
This short course introduces Bayesian statistics at a level appropriate for all practitioners in both academia and industry. This two-hour course introduces fundamental Bayesian concepts, model creation, diagnostics, and interpretation of results.
Examples and sample code will develop participants’ intuition and practical abilities. Learners will understand the differences between frequentist statistics and Bayesian statistics; explain the importance and use of priors, posteriors and likelihoods; understand the use and function of Markov chain Monte Carlo (MCMC) methods; write R code to create Bayesian models; examine convergence of posterior samples; and integrate results into decision-making.
Participants will implement these skills with several examples using practical models (linear regression and logistic regression) with real-world data sets. This workshop will broaden participants’ skill-sets for solving real-world problems.
Outline & Objectives
1. Intro to Bayesian concepts
2. Examples: coin flip, linear regression, logistic regression
3. Interpreting results
4. Conjugate priors
5. MCMC samplers: Why do we need them? How do they work?
6. MCMC convergence: definition, intuition, diagnostics, R code, packages
7. Larger exampler with survival data
--Basic goal: create a logistic regression to model the log odds of survival based on various predictors (e.g. gender, fare class, adult vs child)
--Intermediate: prediction
--Advanced: evaluate impact of the prior distribution, the Monte Carlo sample size, the inclusion/exclusion of variables
8. Review
Goal: introduce participants to the Bayesian statistical framework. Participants will understand and gain hands-on experience with priors, likelihoods, and posteriors; Markov chain Monte Carlo (MCMC) samplers; MCMC convergence; and the basic Bayesian workflow.
About the Instructor
An-Ting Jhuang holds a PhD in statistics from North Carolina State University. She has developed new Bayesian methods to tackle problems in epidemiology and material science. Her research focuses on sparse signal detection in spatial and spatiotemporal statistics, and exposure assessment. She is a Principal Data Scientist at UnitedHealth Group Research & Development in Minnesota. On a day-to-day basis, she identifies research directions and applies statistical methods to solve scientific and business questions in the health-care field.
Christina Knudson holds a PhD in statistics from the University of Minnesota. She is an assistant professor at the University of St. Thomas in Minnesota. She is the author and maintainer of the R package glmm, which is downloaded from CRAN over 1000 times per month. Her most recent contribution is “Revisiting the Gelman-Rubin Diagnostic” (Vats and Knudson), which stabilizes the Gelman-Rubin (GR) statistic, proposes a principled GR threshold for terminating samplers, and connects effective sample size to the GR statistic. Additionally, she is the organizer of the Twin Cities chapter of R Ladies.
Relevance to Conference Goals
Our goal of jump-starting participants’ Bayesian statistics abilities directly aligns with the conference goal of providing participants with the opportunity to learn new statistical methodologies and best practices in statistical analysis. We have designed this short course to broaden applied statisticians skill sets so that they can better consult with and aid customers and organizations solve real-world problems. Our short course will teach statistical techniques that participants can apply to their jobs as applied statisticians; participants will leave the workshop having practiced with several examples using practical models (linear regression and logistic regression) with real data sets.
Sat, Feb 22
4:00 PM - 4:15 PM
Regency EF
Refreshment Break
Other
Sat, Feb 22
4:15 PM - 5:30 PM
Regency A
GS2 - Closing Session
General Session
Chair(s): David J. Corliss, Peace-Work
The Closing Session is an opportunity for you to interact with the CSP Steering Committee in an open discussion about how the conference went and how it could be improved in future years. CSPSC 2021 vice chair, David J. Corliss, will lead a panel of committee members as they summarize their conference experience. The audience will then be invited to ask questions and provide feedback. The committee highly values suggestions for improvements gathered during this time. Each attendee will have an opportunity to win a door prize.