Viewing session type: Short Course (full day)
Back to search menu
Thursday, February 20
Thu, Feb 20
8:00 AM - 5:30 PM
Regency A
SC1 - The Tlverse Software Ecosystem for Targeted Learning
Short Course (full day)
Instructor(s): Alan Hubbard, University of California, Berkeley; Mark van der Laan, University of California, Berkeley
Download Handouts
This full-day short course will provide a comprehensive introduction to the field of targeted learning and the corresponding tlverse software ecosystem (https://github.com/tlverse). In particular, we will focus on targeted minimum loss-based estimators of causal effects, including those of static, dynamic, optimal dynamic, and stochastic interventions. These multiply robust, efficient plug-in estimators use state-of-the-art, ensemble machine learning tools to flexibly adjust for confounding while yielding valid statistical inference. In addition to discussion, this workshop will incorporate both interactive activities and hands-on, guided R programming exercises, to allow participants the opportunity to familiarize themselves with methodology and tools that will translate to real-world data analysis. It is highly recommended for participants to have an understanding of basic statistical concepts such as confounding, probability distributions, confidence intervals, hypothesis tests, and regression. Advanced knowledge of mathematical statistics may be useful but is not necessary. Familiarity with the R programming language will be essential.
Outline & Objectives
By the end of this course participants should be able to:
1. Discuss the utility of the robust estimation strategy of targeted learning in comparison to conventional techniques, which often rely on restrictive statistical models and may therefore lead to severely biased inference.
2. Utilize the super learner, a loss-function-based tool that uses V-fold cross-validation, to obtain the best prediction of the parameter of interest.
3. Calculate nonparametric variable importance metrics with both the super learner and targeted minimum loss-based estimators.
4. Estimate the causal effect of an intervention under static, dynamic, optimal individualized, and stochastic regimes using the tlverse.
5. Implement targeted minimum loss-based estimators when the outcome is subject to missingness, when mediators are present on the causal pathway, in high dimensions, and in studies with two-phase sampling.
6. Interpret the effect of interest under the real-world scenarios mentioned in learning objectives 4 and 5.
7. Construct novel targeted minimum loss-based estimators to extend the tlverse ecosystem of R packages.
About the Instructor
Mark van der Laan, PhD, is Professor of Biostatistics and Statistics at UC
Berkeley. His research group developed loss-based super learning in
semiparametric models, based on cross-validation, as a generic optimal tool for
the estimation of infinite-dimensional parameters, such as nonparametric density estimation and prediction with censored data. Building on this work, Mark's research group developed targeted minimum loss-based estimation as a general optimal methodology for statistical and causal inference. Recently, his group has worked towards developing a principled set of software tools for targeted learning, the tlverse.
Alan Hubbard, PhD, is Professor of Biostatistics. Research in Alan's group is generally motivated by applied problems in computational biology, epidemiology, and precision medicine.
This short course will also be instructed by Jeremy Coyle, PhD, a consulting data scientist who is leading the software development effort that has produced the tlverse ecosystem of R packages. Since the development of this workshop was a joint effort, the following PhD students in biostatistics will also co-instruct: Nima Hejazi, Ivana Malenica, and Rachael Phillips.
Relevance to Conference Goals
This full-day short course will provide participants with practical knowledge about analyzing data of various forms through the application of targeted learning, a state-of-the-art statistical method. Guided by R programming exercises, case studies, and intuitive explanation; participants will build a toolbox for applying the targeted
learning statistical methodology, which will translate to real-world causal inference and statistical analyses. We will feature a diversity of data, relevant to a broad range of applied statisticians.
The overall objective of this course is to provide training to students, researchers, industry professionals, faculty in science, public health, statistics, and other fields to empower them with the necessary knowledge and skills to utilize the sound methodology of Targeted Learning --- a technique that provides tailored pre-specified machines for answering queries, so that each data analysis is completely reproducible, and estimators are efficient, minimally biased, and provide formal statistical inference. This objective aligns with the conference goals, and thereby we believe that we would be a good fit for a full-day short course.
Thu, Feb 20
8:00 AM - 5:30 PM
Regency B
SC2 - Introduction to R: From Programming to Tidying to Analysis
Short Course (full day)
Instructor(s): Philip D. Waggoner, The University of Chicago
The use of R is rapidly increasing in all corners of data science and empirical research. This is for good reason as R is not only a fast and efficient programming language and environment for doing statistics and data analysis, but it is also free and open source. As such, this course will offer a high-level introduction to the statistical computing language of R from start to finish. We will cover a range of topics in "base R" as well as fold in the “tidy” approach to wrangling and visualization in R. The end result will be a fully equipped researcher/practitioner who can efficiently and effectively move from obtaining a messy, unorganized data set to a polished, presentable final product across a variety of domains and applications.
Outline & Objectives
The goals of the course are to get participants comfortable engaging in basic coding in R, wrangling and cleaning complex data, troubleshooting errors on their own, estimating widely used models, and transforming numerical output into visually pleasing figures. As the course is geared toward beginners, no prior coding experience (in or out of R) is assumed. We will start at the ground level to ensure that everyone is at the same place.
As a rough outline, we will cover:
1. Getting started with R and R Studio // Packages // Basic Programming
2. Loading, cleaning, and wrangling data
3. Statistics: widely-used model fitting, interpretation, diagnostics (T-tests, OLS, Binary Response and Count models)
4. Data Visualization: in Base R and the Tidyverse
5. (If time) Advanced Topics: Basic Webscraping and Text Analysis (preprocessing and wordclouds)
The goal is for a high level introduction to the practical use of R for a host of applications and fields. Thus, we start at the ground level and no prerequisites or prior coding experience is necessary. Some level of basic applied statistics would be useful (but not required) to fully understanding the model fitting portion.
About the Instructor
I have been using R professionally for many years, and incorporated in my Ph.D. dissertation. Further, I have taught a semester-version of this course to Master of Public Policy students at the College of William & Mary. Further, I have written and coauthored many R packages of my own, as well as I am a member of "easystats" which is a software development group focused on writing packages to make statistics in R easy (https://github.com/orgs/easystats/people). Further, a colleague (Ryan Kennedy, University of Houston) and I are writing a book on introducing the Tidyverse version of R to the social science community. I already have scripts and many example datasets, as well as "worksheets" (.Rmd files) prepared for all units. These are available at my Github: https://github.com/pdwaggoner/Intro-to-R . Thus, I am prepared, experienced, and eager to present a high-level introduction to R to non-users or those wanting to widen their scope of statistical programming a bit more.
Relevance to Conference Goals
1. Learn statistical methods or programming techniques that apply to their job as applied statisticians: For this first goal, as this course is geared towards beginners, the assumption is that those who sign up will be eager to learn new techniques, which I will teach from start to finish. Further, I will give students sample data and R scripts for all topics so they can use adapt and extend these concepts in the future for their own reasons.
2. Better communicate and collaborate with their clients and customers: By learning these techniques, as well as how they fit into a broader framework of a consolidated research project, users will avoid the "piecemeal"/self-taught route of learning R which inevitably produces gaps in understanding. Instead, by taking this class, students will learn how all of these pieces (from wrangling to programming to fitting models and visualizing results) fit together and thus how they can best present information to interested parties.
3. Have a positive effect on their organization or enhance their professional development: The previous two goals being met, this third goal is a natural byproduct, where learning more == empowerment == excitement!
Thu, Feb 20
8:00 AM - 5:30 PM
Golden State
SC3 - Hands-On Introduction to Python in Data Science
Short Course (full day)
Instructor(s): Mei Najim, Advanced Analytics Consulting Services, LLC
Download Handouts
This course is designed to provide a hands-on introduction to Python, the well- known open-source programming language for data science including predictive modeling and data analysis. A case study using insurance data is employed in order to methodically expose attendees to data science best practices and hands-on experience in Python. Sample data and Python coding are provided.
Outline & Objectives
Outline:
(1) Learn how Jupyter Notebooks work, and cover the basics of programming including data structures, data operations, if else statements, for and while loops, and logical operations, etc.
(2) An in-depth Predictive Analytics Case Study in Insurance
Learning Objectives: Get some hands-on experience in Python
(1) Learn how to explore and prepare data in Python
(2) Use a variety of statistical methods and machine learn algorithms: GLM, decision trees and random forests, neural nets to build predictive models in Python.
Audiences: Statisticians, such as manufacturing, pharmaceutical, banking and government agencies; Statistical researchers/analysts in universities; Graduate students in statistics departments.
Prerequisites: BS/MS level education in statistics or mathematics with some programming experience; Install Jupyter Notebooks.
About the Instructor
Mrs. Mei Najim provides advanced analytics consulting services to the Property & Casualty insurance industry mainly in Strategic Planning (Developing advanced analytics strategic short-term and long-term plans for the organization) and Advanced Analytics Capability Building (Developing full life cycle analytics processes from raw data exploration to analytics solutions implementation into IT data systems). Mei has 15 years hands-on big data advanced analytics experience including statistical methods, machine learning algorithms, and data mining in the Property & Casualty insurance industry. She also has experience in catastrophic modeling, actuarial pricing, reserving, and R&D. Mei has frequently presented at conferences to share and further develop her expertise. Mei holds a BS degree in Actuarial Science from Hunan University and two MS degrees, one in Applied Mathematics and the other in Statistics, from Washington State University. Mei is a member of the American Statistical Association and a Certified Specialist in Predictive Analytics (CSPA) of the Casualty of Actuary.
Relevance to Conference Goals
The objective is to provide attendees with hands-on experience about data science, modeling, and analyzing data of various forms through the application of state-of-the-art statistical methods and machine learning algorithms in Python.