Back to search menu
Thursday, February 20
Thu, Feb 20
7:00 AM - 6:30 PM
Ballroom Foyer
Registration
Registration
Thu, Feb 20
8:00 AM - 5:30 PM
Regency A
SC1 - The Tlverse Software Ecosystem for Targeted Learning
Short Course (full day)
Instructor(s): Alan Hubbard, University of California, Berkeley; Mark van der Laan, University of California, Berkeley
Download Handouts
This full-day short course will provide a comprehensive introduction to the field of targeted learning and the corresponding tlverse software ecosystem (https://github.com/tlverse). In particular, we will focus on targeted minimum loss-based estimators of causal effects, including those of static, dynamic, optimal dynamic, and stochastic interventions. These multiply robust, efficient plug-in estimators use state-of-the-art, ensemble machine learning tools to flexibly adjust for confounding while yielding valid statistical inference. In addition to discussion, this workshop will incorporate both interactive activities and hands-on, guided R programming exercises, to allow participants the opportunity to familiarize themselves with methodology and tools that will translate to real-world data analysis. It is highly recommended for participants to have an understanding of basic statistical concepts such as confounding, probability distributions, confidence intervals, hypothesis tests, and regression. Advanced knowledge of mathematical statistics may be useful but is not necessary. Familiarity with the R programming language will be essential.
Outline & Objectives
By the end of this course participants should be able to:
1. Discuss the utility of the robust estimation strategy of targeted learning in comparison to conventional techniques, which often rely on restrictive statistical models and may therefore lead to severely biased inference.
2. Utilize the super learner, a loss-function-based tool that uses V-fold cross-validation, to obtain the best prediction of the parameter of interest.
3. Calculate nonparametric variable importance metrics with both the super learner and targeted minimum loss-based estimators.
4. Estimate the causal effect of an intervention under static, dynamic, optimal individualized, and stochastic regimes using the tlverse.
5. Implement targeted minimum loss-based estimators when the outcome is subject to missingness, when mediators are present on the causal pathway, in high dimensions, and in studies with two-phase sampling.
6. Interpret the effect of interest under the real-world scenarios mentioned in learning objectives 4 and 5.
7. Construct novel targeted minimum loss-based estimators to extend the tlverse ecosystem of R packages.
About the Instructor
Mark van der Laan, PhD, is Professor of Biostatistics and Statistics at UC
Berkeley. His research group developed loss-based super learning in
semiparametric models, based on cross-validation, as a generic optimal tool for
the estimation of infinite-dimensional parameters, such as nonparametric density estimation and prediction with censored data. Building on this work, Mark's research group developed targeted minimum loss-based estimation as a general optimal methodology for statistical and causal inference. Recently, his group has worked towards developing a principled set of software tools for targeted learning, the tlverse.
Alan Hubbard, PhD, is Professor of Biostatistics. Research in Alan's group is generally motivated by applied problems in computational biology, epidemiology, and precision medicine.
This short course will also be instructed by Jeremy Coyle, PhD, a consulting data scientist who is leading the software development effort that has produced the tlverse ecosystem of R packages. Since the development of this workshop was a joint effort, the following PhD students in biostatistics will also co-instruct: Nima Hejazi, Ivana Malenica, and Rachael Phillips.
Relevance to Conference Goals
This full-day short course will provide participants with practical knowledge about analyzing data of various forms through the application of targeted learning, a state-of-the-art statistical method. Guided by R programming exercises, case studies, and intuitive explanation; participants will build a toolbox for applying the targeted
learning statistical methodology, which will translate to real-world causal inference and statistical analyses. We will feature a diversity of data, relevant to a broad range of applied statisticians.
The overall objective of this course is to provide training to students, researchers, industry professionals, faculty in science, public health, statistics, and other fields to empower them with the necessary knowledge and skills to utilize the sound methodology of Targeted Learning --- a technique that provides tailored pre-specified machines for answering queries, so that each data analysis is completely reproducible, and estimators are efficient, minimally biased, and provide formal statistical inference. This objective aligns with the conference goals, and thereby we believe that we would be a good fit for a full-day short course.
Thu, Feb 20
8:00 AM - 5:30 PM
Regency B
SC2 - Introduction to R: From Programming to Tidying to Analysis
Short Course (full day)
Instructor(s): Philip D. Waggoner, The University of Chicago
The use of R is rapidly increasing in all corners of data science and empirical research. This is for good reason as R is not only a fast and efficient programming language and environment for doing statistics and data analysis, but it is also free and open source. As such, this course will offer a high-level introduction to the statistical computing language of R from start to finish. We will cover a range of topics in "base R" as well as fold in the “tidy” approach to wrangling and visualization in R. The end result will be a fully equipped researcher/practitioner who can efficiently and effectively move from obtaining a messy, unorganized data set to a polished, presentable final product across a variety of domains and applications.
Outline & Objectives
The goals of the course are to get participants comfortable engaging in basic coding in R, wrangling and cleaning complex data, troubleshooting errors on their own, estimating widely used models, and transforming numerical output into visually pleasing figures. As the course is geared toward beginners, no prior coding experience (in or out of R) is assumed. We will start at the ground level to ensure that everyone is at the same place.
As a rough outline, we will cover:
1. Getting started with R and R Studio // Packages // Basic Programming
2. Loading, cleaning, and wrangling data
3. Statistics: widely-used model fitting, interpretation, diagnostics (T-tests, OLS, Binary Response and Count models)
4. Data Visualization: in Base R and the Tidyverse
5. (If time) Advanced Topics: Basic Webscraping and Text Analysis (preprocessing and wordclouds)
The goal is for a high level introduction to the practical use of R for a host of applications and fields. Thus, we start at the ground level and no prerequisites or prior coding experience is necessary. Some level of basic applied statistics would be useful (but not required) to fully understanding the model fitting portion.
About the Instructor
I have been using R professionally for many years, and incorporated in my Ph.D. dissertation. Further, I have taught a semester-version of this course to Master of Public Policy students at the College of William & Mary. Further, I have written and coauthored many R packages of my own, as well as I am a member of "easystats" which is a software development group focused on writing packages to make statistics in R easy (https://github.com/orgs/easystats/people). Further, a colleague (Ryan Kennedy, University of Houston) and I are writing a book on introducing the Tidyverse version of R to the social science community. I already have scripts and many example datasets, as well as "worksheets" (.Rmd files) prepared for all units. These are available at my Github: https://github.com/pdwaggoner/Intro-to-R . Thus, I am prepared, experienced, and eager to present a high-level introduction to R to non-users or those wanting to widen their scope of statistical programming a bit more.
Relevance to Conference Goals
1. Learn statistical methods or programming techniques that apply to their job as applied statisticians: For this first goal, as this course is geared towards beginners, the assumption is that those who sign up will be eager to learn new techniques, which I will teach from start to finish. Further, I will give students sample data and R scripts for all topics so they can use adapt and extend these concepts in the future for their own reasons.
2. Better communicate and collaborate with their clients and customers: By learning these techniques, as well as how they fit into a broader framework of a consolidated research project, users will avoid the "piecemeal"/self-taught route of learning R which inevitably produces gaps in understanding. Instead, by taking this class, students will learn how all of these pieces (from wrangling to programming to fitting models and visualizing results) fit together and thus how they can best present information to interested parties.
3. Have a positive effect on their organization or enhance their professional development: The previous two goals being met, this third goal is a natural byproduct, where learning more == empowerment == excitement!
Thu, Feb 20
8:00 AM - 5:30 PM
Golden State
SC3 - Hands-On Introduction to Python in Data Science
Short Course (full day)
Instructor(s): Mei Najim, Advanced Analytics Consulting Services, LLC
Download Handouts
This course is designed to provide a hands-on introduction to Python, the well- known open-source programming language for data science including predictive modeling and data analysis. A case study using insurance data is employed in order to methodically expose attendees to data science best practices and hands-on experience in Python. Sample data and Python coding are provided.
Outline & Objectives
Outline:
(1) Learn how Jupyter Notebooks work, and cover the basics of programming including data structures, data operations, if else statements, for and while loops, and logical operations, etc.
(2) An in-depth Predictive Analytics Case Study in Insurance
Learning Objectives: Get some hands-on experience in Python
(1) Learn how to explore and prepare data in Python
(2) Use a variety of statistical methods and machine learn algorithms: GLM, decision trees and random forests, neural nets to build predictive models in Python.
Audiences: Statisticians, such as manufacturing, pharmaceutical, banking and government agencies; Statistical researchers/analysts in universities; Graduate students in statistics departments.
Prerequisites: BS/MS level education in statistics or mathematics with some programming experience; Install Jupyter Notebooks.
About the Instructor
Mrs. Mei Najim provides advanced analytics consulting services to the Property & Casualty insurance industry mainly in Strategic Planning (Developing advanced analytics strategic short-term and long-term plans for the organization) and Advanced Analytics Capability Building (Developing full life cycle analytics processes from raw data exploration to analytics solutions implementation into IT data systems). Mei has 15 years hands-on big data advanced analytics experience including statistical methods, machine learning algorithms, and data mining in the Property & Casualty insurance industry. She also has experience in catastrophic modeling, actuarial pricing, reserving, and R&D. Mei has frequently presented at conferences to share and further develop her expertise. Mei holds a BS degree in Actuarial Science from Hunan University and two MS degrees, one in Applied Mathematics and the other in Statistics, from Washington State University. Mei is a member of the American Statistical Association and a Certified Specialist in Predictive Analytics (CSPA) of the Casualty of Actuary.
Relevance to Conference Goals
The objective is to provide attendees with hands-on experience about data science, modeling, and analyzing data of various forms through the application of state-of-the-art statistical methods and machine learning algorithms in Python.
Thu, Feb 20
8:00 AM - 12:00 PM
Regency C
SC4 - Side-by-Side Learning of R and Python by Analyzing Big Longitudinal Data
Short Course (half day)
Instructor(s): Mohammed Rahim Uddin Chowdhury, Kennesaw State University
R and Python are two highly used open-source interpreted programming languages with a large and diverse community. Due to the open-source nature, new libraries are developed and added continuously to their respective catalog for researchers when new Mathematical, Statistical or other models are discovered. R has more than 12000 packages available in CRAN (open-source repository), which researchers can use to perform whatever analysis they need. The rich variety of library makes R the first choice for statistical analysis, especially for specialized analytical work. On the other hand, Python does not have that many packages for data analysis and data modeling. Most of the data science job can be done with five Python libraries: Numpy, Pandas, Scipy, Scikit-learn and Seaborn. However, it is known to the scientific community that Python is catching up R by rapidly developing packages for data mining and statistical modeling. In this short course at CSP 2020, I will show in details the side by side comparisons between R and Python on six topics such as data mining and data analysis, test of hypothesis, correlation and regression, simulation, mathematical computations, text mining.
Outline & Objectives
The outline of the short course is to discuss the application of R and Python on the problems of
1. Data mining and data analysis (consists of 50 different data mining problems)
2. Test of Hypotheses and confidence interval (consists of 20 different problems)
3. Regression models (16 different models will be discussed)
4. Simulations (9 different simulation design will be discussed)
5. Mathematical Computations (50 different problems will be computed)
6. Text mining (Word cloud, sentimental analysis, and most graphs for frequently used word will be discussed)
The objective of this short course is to train participants on how to use R and Python simultaneously in solving problems from above mentioned topics for their professional works. The instructor of the short course does not require that the participants should have prior knowledge of using R and Python. The instructor will provide all the problems in easily understandable questions format together with R and Python programming code. First, the instructor will discuss the problems, and then he will run the R and Python code together with the participants.
About the Instructor
I obtained my PhD degree in Statistics in 2013, and working as a tenure track Assistant Professor of Statistics in the Department of Statistics and Analytical Science at Kennesaw State University since August 2015. During my four years at KSU, I have taught altogether ten unique undergraduate and graduate courses, which is more than two new courses per year. Five courses are undergraduate courses and they are as varied as introductory statistics courses up to R and Python programming. I was motivated to teach python programming as it has a high and growing demand in industry, and many employers want data engineer with expertise in python. Five other courses are graduate courses. I taught a theoretical and computation Bayesian Statistics special topic course for graduate students. R programming language was used to teach computational parts such as EM algorithm, MCMC, Gibbs sampling, Metropolis algorithm, and Metropolis-Hasting algorithm. Another graduate course is Applied Time Series Analysis. For teaching most courses, I always prefer R programming language. I taught the undergraduate R programming course in Fall 2018. In Spring 2019, I am taught Python Programming course.
Relevance to Conference Goals
‘Conference on Statistical Practice’ is usually considered a platform for applied researchers, who use novel statistical and machine learning methods to solve data driven problems. To solve data driven problem, R and Python have built in packages to use. This short course will introduce both R and Python to analyze a big longitudinal data. In additional various simulation designs and text mining will be discussed in this course. This course will help any person interested to learn R and Python from the scratch.
Thu, Feb 20
8:00 AM - 12:00 PM
Regency D
SC5 - Essential Collaboration: The ASCCR Frame
Short Course (half day)
Instructor(s): Heather Smith, Cal Poly; Eric Vance, LISA-University of Colorado Boulder
Download Handouts
Statisticians and data scientists often collaborate with domain experts from many different fields in academia, business, and government. Learning more effective collaboration skills will enable us to maximize our professional impact in these areas. In this short course, participants will learn and practice essential skills that will enable them to improve their collaborations and add more value to their projects, customers, and organizations. We introduce the ASCCR framework that describes our current best practices for five aspects of statistical consulting and collaboration (Attitude-Structure-Content-Communication-Relationship). Specifically, participants will learn how to establish foundational collaborative Attitudes, implement the POWER Structure for conducting effective meetings, apply the Q1Q2Q3 approach to consultations and collaborations, Communicate more effectively, and adopt practical strategies to strengthen Relationships. Participants will practice these skills via team exercises, role-plays, video coaching, and individual reflections to become more effective collaborators, allowing them to have greater impact in their roles as statisticians and data scientists.
Outline & Objectives
Our objective is to introduce key concepts that will help participants improve their collaboration skills so they can return to key roles within their organizations and achieve greater impact. This short course will be useful for all levels from beginning to advanced. Prerequisites are a desire to improve one’s personal effectiveness and openness to try new methods and ways of thinking in the practice of statistics and data science.
1 Welcome and warm-up team exercises
2 Introduction to ASCCR Frame
3 Attitude of effective collaboration (participants complete Attitude checklist)
4 POWER structure (Prepare-Open-Work-End-Reflect) and why we believe this structure produces effective meetings
5 Best practices for opening meetings (Eric and Heather mock role play, video review, then participants role play)
6 Best practices for ending meetings (Eric and Heather mock role play, video review)
Break
7 Q1Q2Q3 approach to the Content of statistical projects (reflection exercise)
8 Triangle of Statistical Communication (team discussion)
9 Tips for strengthening Relationships (reflection exercise)
10 Overall written reflection and individual plan for improving collaboration skills.
About the Instructor
For the past 11 years, Dr. Eric Vance, an Associate Professor at the University of Colorado Boulder, has been the director of LISA (Laboratory for Interdisciplinary Statistical Analysis) where he has trained 271 statisticians to move between theory and practice to collaborate with 9500+ domain experts to apply statistics and data science to answer their research or business questions. He has taught workshops and webinars on collaboration in nine countries around the world, including several in collaboration with Heather Smith.
Heather Smith has 28 years of experience consulting with academic, industrial, service, and government clients in the United States, Europe, and Asia. She began this work as a statistical consultant at Westat, Inc. For 21 years she has been a faculty member in the Statistics Department at Cal Poly San Luis Obispo where she consults with academic and private sector researchers and teaches a wide variety of applied statistics courses, including courses in statistical communication and consulting. She has offered over a dozen workshops, short courses, and webinars on these topics, and has trained hundreds of statistical collaborators.
Relevance to Conference Goals
This short course is relevant for all three of the three main conference goals. Participants will learn new skills and practical tips to apply whenever they interact with another person in their job as an applied statistician. Participants will explicitly learn how to better communicate and collaborate with their clients and customers. Skills learned in the course will equip participants to have a positive impact on their organization and an upward career trajectory. Participants will return to their jobs with new ideas, techniques, and strategies to improve their ability to communicate and collaborate effectively, resulting in a greater impact on their organizations and increasing the overall impact of statistics and data science in the world at large.
A version of this course was taught at the 2018 CSP and received a high average rating of 4.63 out of 5 (n=8 responding out of 22 participants). The official qualitative feedback we received: “This course is essential for any statistician who needs to collaborate with people in other disciplines, or sell their business to clients. I very strongly recommend it.” Unofficial feedback was very positive as well.
Thu, Feb 20
1:30 PM - 5:30 PM
Regency C
SC6 - Increasing Business Impact Through Automated Reporting in R
Short Course (half day)
Effective communication of results is among the essential duties of the industrial statistician, but the sometimes tedious mechanics of report production together with the sheer volume of data that many statisticians now must process combine to make reporting design an afterthought in too many cases. In this half-day course, we review recent advances in automated report production that liberate resources for statisticians to focus on the interpretation and communication of results, while simultaneously reducing errors and increasing consistency of analyses. We teach the course through an extended example, cumulatively building an R script that takes participates from receipt of an example dataset to a beautifully-designed and nearly completed PowerPoint presentation automatically and using freely available, open-source packages. Details of how to customize the final presentation to incorporate corporate branding - such as logos, font choices, and color palettes - will also be covered.
Level: We recommend a minimal level of experience using R, RStudio, and the tidyverse.
Outline & Objectives
With this half-day course, we help industrial statisticians increase their business impact by leveraging tools for automated report production in R.
Topics covered include:
* What does automated reporting mean in practice?
* Scripting analyses, tables, and charts
* Automated production of PowerPoint presentations
* Building a "cookbook" of reporting recipes
* Font choices and color palettes
* Layering storytelling onto an automated report
About the Instructor
Dr. John Ennis is president of Aigora (www.aigora.com), a consulting and coaching organization dedicated to helping market researchers prepare for the rise of artificial intelligence. As part of this preparation, Aigora provides instruction in the automation of standard work practices, including report preparation. Dr. Ennis, a Ph.D. mathematician who conducted his postdoctoral training in computational neuroscience, has 11+ years of market research consulting experience, has presented at JSM and CSP, and will have presented at SDSS by the time of CSP 2020. In addition, Dr. Ennis is the author of over 30 peer-reviewed publications and two books on quantitative market research topics. Earlier this year, Dr. Ennis branched out from the Institute for Perception to found Aigora - in his prior work, Dr. Ennis was a well-reviewed instructor at dozens of short courses covering quantitative market research, including instruction on topics within data science. In his professional work, Dr. Ennis has used tools for automated reporting for approximately five years, and he now teaches such tools to his clients operating within a variety of enterprise-level businesses.
Relevance to Conference Goals
Through participation in this course, attendees will learn to support their internal clients with well-designed and easy-to-read reports they prepare quickly and can continually improve over time, building their credibility and influence within their organizations.
Thu, Feb 20
1:30 PM - 5:30 PM
Regency D
SC7 - Building LaTeX Templates for R Markdown to Produce Branded PDF Reports
Short Course (half day)
Instructor(s): Ben Barnard, Wells Fargo
Branded reports give a clean, clear and consistent message for data science teams in an organization. We walk through the process of building a latex template distributed through an R package. We begin with a short introduction to rmarkdown and some motivating examples for using branded reports. Then, we demonstrate from scratch how one can build a minimal latex template, and distribute in a R package. We describe some best practices for branding and highlight use of ggplot2 themes to match document branding. Finally, we walk through some further uses such as parameterized reports, using the template for bookdown, and recommendation for deploying the R package at your company.
Outline & Objectives
The student should be able to walk away from this class with:
1. a general understanding of rmarkdown,
2. why it is important to have branded reports,
3. a R package with a latex template that uses their companies branding,
4. understanding of best practices in branding,
5. use of ggplot2 themes,
6 and some possible further uses for the using and distributing the template.
About the Instructor
Ben Barnard is a Data Scientist at Wells Fargo in the Team Member Insights group. Ben has a PhD from Baylor University in Statistics.
Jeff Idle is an Analytic Manager at Wells Fargo in the Team Member Insights group. Jeff leads the HR Advanced Analytics & Architecture team. Jeff is currently pursuing a MBA from the University of Minnesota's Carlson School of Management.
Relevance to Conference Goals
We stress using branded reports to communicate clean, clear and consistent messages to your audience. Communication is the most important part of Data Science since decision makers are rarely analytic experts. Branded reports bring a certain professionalism that will be greatly appreciated by administration. Building the latex templates saves time and makes sure every report comes out looking the same. Consistently branded reports allows your team to be recognized immediately by your work product.
Thu, Feb 20
5:30 PM - 7:00 PM
Regency EF
PS1 - Poster Session 1 and Opening Mixer
Poster Session
Chair(s): Alek Kotolyan, dot818
1
2
3
4
5
6
7
8
9
11
12
13
14
15
16
17
18
Thu, Feb 20
5:30 PM - 7:00 PM
Regency EF
Exhibits Open
Exhibits