Keynote Address | Concurrent Sessions | Poster Sessions
Short Courses (full day) | Short Courses (half day) | Tutorials | Practical Computing Demonstrations | Closing General Session with Refreshments

Last Name:

Abstract Keyword:



Viewing Short Course (half day)s onlyView Full Program
Thursday, February 23
SC2 Becoming a Student of Leadership in Statistics
Thu, Feb 23, 8:00 AM - 12:00 PM
City Terrace 7
Instructor(s): Matthew Gurka, University of Florida; Robert Rodriguez, SAS Institute Inc.; Gary R, Sullivan, Eli Lilly & Company

Download Handouts
What is leadership? Much has been written and discussed within the statistics profession in the last few years on the topic and its importance in advancing our profession. This course will provide an introductory understanding of leadership as well as initial direction for statisticians who wish to develop as leaders. It will feature a leader in the statistics profession speaking on their personal journey as well as providing guidance on personal leadership development. You will also be introduced to some important leadership competencies - including influence, business acumen, and communication - and will begin to draft a plan for (1) developing your own leadership or (2) addressing a leadership challenge in your work. Finally, you will spend time reflecting on leadership learnings and networking with other statisticians and practitioners.

Outline & Objectives

1. Gain a better understanding of leadership, including:
- How established leaders in our profession have developed their leadership skills;
- Insights and perspectives on leadership from other professional statisticians gained through interactions, discussions and group work that will improve each attendee’s ability to lead
- Inspiration to become a leader

2. Improved ability to recognize and develop characteristics of leaders, including:
- Ideas on how to acquire greater knowledge in organizational/business acumen
- Insights into the importance communication plays in leadership;
- Other qualities that impact a leader’s ability to influence;

3. Develop a path for your own leadership development, including:
- A draft of your own leadership principles and a plan for continuing your leadership journey;
- Membership in a small peer leadership group that will continue to share learning, perspectives, experiences and ideas on leadership after completing the workshop

About the Instructor

Gary R. Sullivan, Ph.D., is the senior director of Non-Clinical Statistics at Eli Lilly and Company, a major pharmaceutical manufacturer headquartered in Indianapolis, IN. Gary joined Lilly in 1989 and has 13 years of experience as a project statistician where he collaborated with pharmaceutical formulators, chemists, biologists, and engineers on formulation design, process optimization & modeling, assay development & characterization, and production monitoring. He has spent the last 14 years in various leadership positions with responsibilities for statisticians collaborating in manufacturing, product development, discovery research, and biomarker research. His personal passions include quality management, experimental design, process optimization and leadership development. Gary has developed leadership training for the Biometrics organization at Eli Lilly, proposed and facilitated the initial JSM leadership course in 2014, and is the current chair for the ASA Ad Hoc Leadership Committee. Gary received his Ph.D. in statistics in 1989 from Iowa State University working with Dr. Wayne A. Fuller.

Matthew J. Gurka, Ph.D., is a Professor in the Department of Health Outcomes and Policy at the University of Florida (UF), where he is also the Associate Director of the Institute for Child Health Policy. Prior to his recent appointment at UF, Matthew was the Founding Chair of the Department of Biostatistics in the School of Public Health at West Virginia University, where he also led the Clinical Research Design, Epidemiology, and Biostatistics Program of the West Virginia Clinical and Translational Science Institute (WVCTSI). Matthew received a Ph.D. in biostatistics in 2004 from the University of North Carolina at Chapel Hill. In addition to his interests in mixed models and power analysis, Matthew has extensive collaborative and independent research experience in pediatrics. Recently he has focused on childhood and adult obesity, where he has obtained PI funding from the NIH to develop and validate improved measures of the metabolic syndrome. He recently completed a term on the Executive Editorial Board of the journal Pediatrics and is currently on the Editorial Board of the Journal of Pediatrics. Matthew has participated in the JSM leadership course since 2014, and is a member of the ASA Ad Hoc Leadership Committee.

Relevance to Conference Goals

The above objectives of this course align with the goal of the Conference on Statistical Practice to “provide opportunities for attendees to further their career development and strengthen relationships in the statistical community.” Understanding leadership and learning ways to develop leadership skills further are important for the continued development of one’s career. In addition, this course provides the opportunity to build peer relationships among fellow participants who share an interest in leadership.

Software Packages


SC3 Peering into the Future: Introduction to Time Series Methods for Forecasting
Thu, Feb 23, 8:00 AM - 12:00 PM
City Terrace 9
Instructor(s): Dave Dickey, North Carolina State University

Download Handouts
This workshop will provide a practical guide to time series analysis and forecasting, focusing on examples and applications in modern software. Students will learn how to recognize autocorrelation when they see it and how to incorporate autocorrelation into their modeling. Models in the ARIMA class and their identification, fitting, and diagnostic testing will be emphasized and extended to models with deterministic trend functions (inputs) and ARMA errors. Diagnosing stationarity, a critical feature for proper analysis, will be demonstrated. After the course, students should be able to identify, fit, and forecast with this class of time series models and be aware of the consequences of having autocorrelated data. They should be able to recognize nonstationary cases in which the differences in the data, rather than the levels, should be analyzed. Underlying ideas and interpretation of output, rather than code, will be emphasized. No previous experience with any particular software is needed. Examples will be computed in SAS, but most modern statistical packages such as SPSS, R, STATA, etc. can be used for time series analysis.

Outline & Objectives

Outline of course topics:

(1)Identifying and fitting ARMA models,


(3)Incorporating inputs: Regression with Time Series Errors,

(4)Intervention Analysis,

(5)Nonstationarity: Unit Roots and Stochastic Trends,

(Optional: Seasonal models time permitting)

Benefits of the course include an understanding of new issues encountered when data are taken over time and how to deal with these issues. Not only are new techniques of analysis necessary, which the student will learn, but additional terminology arises in these cases.
Examples and practical interpretation along with the strengths and weaknesses of competing forecasting methodologies will be emphasized.
I hope to give examples of interesting data analyses that can be used as templates for analyzing the participants' data when they return home.

About the Instructor

David A. Dickey received his PhD in statistics in 1976 from Iowa State University working with Dr. Wayne A. Fuller. Their “Dickey-Fuller” test is a part of most modern time series software packages. He is on the ISI’s list of highly cited researchers and is an ASA Fellow. Dickey is William Neal Reynolds Professor of Statistics at North Carolina State University where he does time series research, teaches graduate level methods courses, does consulting, and mentors graduate students. He is coauthor of several books on statistics, including “The SAS System for Forecasting Time Series,” a publication of SAS Institute. He has presented at many conferences including the 2013 ASA Conference on Statistical Practice and several JSM sessions. He has been a contact instructor for SAS Institute since 1981 teaching courses in statistical methodology, including time series, and has helped write some of their course notes. Recently Dickey has been teaching for NC State University's Institute for Advanced Analytics which offers an intensive applied Master’s degree in a 9 month cohort program. He has appointments in Economics and the NCSU Financial Math program.

Relevance to Conference Goals

The student will be better able to communicate intelligently with clients having data taken over time by learning the terms and the concepts behind them. The benefits of being able to better forecast what is going to happen next should be of obvious value to any company collecting data over time. The successful student should be able to carry out an analysis of time dependent data from model identification, through fitting and diagnostic checking, all the way to producing forecasts.

Software Packages

SC4 Producing High-Quality Figures in SAS to Meet Publication Requirement, with Practical Examples
Thu, Feb 23, 8:00 AM - 12:00 PM
City Terrace 12
Instructor(s): Charlie Chunhua Liu, Allergan PLC

Download Handouts
The half day short course will cover publication requirements on high-quality figures, discus principles to produce high-quality figures in SAS, demonstrate using both SAS/GRAPH and ODS Graphics Procedures to produce some commonly used types of figures (line plots, scatter plots, bee swarm plots, box plots, and box plots overlaid with bee swarm plots etc.).

The instructor will also demonstrate to produce the above mentioned high-quality figures in listing (EMF, EPS, etc.) and document formats (RTF, PDF etc.).

Outline & Objectives

The proposed half day short course will cover the followings.

1) Publication requirements on high-quality figures

2) SAS/GRAPH options and formats for high-quality figures

3) Producing high-quality figures in listing format (EMF, WMF, PS, EPS, etc.) and document format (PDF, RTF, etc.)

4) Demonstrate how to produce some commonly used types of figures, including line plots, jittered scatter plots, line-up jittered scatter plots (AKA bee swarm plot), box plots, scatter plot overlaid with box plots etc.

5) Producing high-quality figures in SAS Enterprise Guide (EG) environment

About the Instructor

Charlie Liu, PhD is the author of the book "Producing High-quality Figures Using SAS/GRAPH and ODS GRAPHICS Procedures", by CRC Press, Taylor & Francis Group in 2015. (

Dr. Liu has worked as a SAS programmer and project statistician for more than a decade in several research institutions, and pharmaceutical companies, including US EPA, National Institute of Statistical Sciences (NISS), Washington University Medical School at St. Louis, Eli Lilly and Company, Allergan Inc. and Kythera Biopharmaceuticals. He is now an associate director of biostatistics at Allergan PLC.

Dr. Liu is an excellent conference speaker and has presented at various SAS user conferences, including the SAS Global Forum 2013 and JSM 2015. He won an Outstanding Speaker Award at the Mid-West SAS User Group (MWSUG) Conference in 2007, Des Moines, IA.

Relevance to Conference Goals

The proposed half day short course will help the participants in areas outlined below.
1) Learn principles and techniques to produce high-quality figures in SAS to meet publication requirements. 2) Learn how to produce some commonly used graphs in SAS, including line plots, scatter plots, bee swarm plots, thunderstorm scatter plots, and box plots etc. 3) Have a positive impact on statisticians/programmers to present scientific research data using high-quality graphs

Software Packages

SAS 9.3 or higher

SC5 Linear Mixed Models Through Health Sciences Applications
Thu, Feb 23, 8:00 AM - 12:00 PM
River Terrace 3
Instructor(s): Constantine Daskalakis, Thomas Jefferson University

Download Handouts
This course will focus on the heuristic understanding of linear mixed models and their implementation (including assessment of assumptions and model fit, and interpretation of results), rather than formal statistical theory. The following general topics will be covered: a. Specification and interpretation of the fixed effects (population-averaged/mean) model. b. Specification and interpretation of the random effects and their covariance structure (subject-specific effects). c. Considerations regarding the error structure. d. Statistical and graphical methods of assessment of (a), (b), and (c), and model selection strategies. e. Determination, estimation, and testing of linear combinations/contrasts of coefficients to address scientific objectives. f. Writing brief summaries of the results for non-statistical audiences.

These topics will be addressed through the analysis of data from two studies: (1) a school-based intervention program designed to impact students’ body mass index (BMI); and (2) an animal xenograft experiment designed to assess the effects of a drug and of radiotherapy on tumor growth.

Outline & Objectives

This course is appropriate for an audience that has knowledge of statistics at the level of applied regression. The main requirement is a basic understanding of concepts of confidence intervals, statistical testing, and general regression modeling. The audience may consist of:
a. undergraduate or graduate students in statistics and related quantitative fields with biomedical focus; and
b. consulting/applied statisticians analyzing multi-level data or longitudinal data with repeated measures in the health sciences.

Participants will learn how to apply, evaluate, and interpret linear mixed-effects regression models, through two health sciences applications. Specifically, they will learn how to:
a. Perform linear mixed-effects regression modeling in SAS or Stata.
b. Specify and perform appropriate comparisons/contrasts.
c. Display results in tabular and graphical form.
d. Assess model assumptions, evaluate model fit, and compare alternative specifications/models.
e. Interpret statistical results (estimates, p-values, etc.) to address scientific objectives.
f. Communicate findings to non-statistical audiences.

About the Instructor

Dr. Daskalakis is Associate Professor of Biostatistics at Thomas Jefferson University and has 15 years of experience as a collaborating statistician in a biomedical environment. He has worked on numerous published studies involving mixed-effects modeling of both hierarchical and longitudinal data. He has taught biostatistics, clinical trials, and regression methods to non-statistical audiences (including a shorter version of the proposed course). Dr. Daskalakis has been very active in ASA’s professional activities through the Section of Teaching of Statistics in the Health Sciences.

Relevance to Conference Goals

The course is aligned with the conference’s second theme, “Data Modeling and Analysis.” In line with the conference’s goal, the course will allow participants to enhance their programming, analysis, and communication skills. It has been designed as a practical hands-on tutorial on linear mixed models, a modern regression approach to the analysis of correlated hierarchical, clustered, and/or longitudinal data. The course may have special value for consulting/applied statisticians who are a large fraction of CSP’s attendees.

Software Packages

Participants are strongly encouraged to bring their computers for hands-on practice.
The course will use SAS (Proc Mixed) and Stata (–mixed–) code and output. Both of those programs have strong capabilities in fitting linear mixed models, with user-friendly modules. In contrast, R’s packages for fitting linear mixed models (nlme and lme4) have limitations, and their syntax and use (beyond fitting simple models) can be quite complicated. For these reasons, R will not be used in the course.
Extensive SAS and Stata code for fitting linear mixed models will be provided to participants. R code will also be provided but will not be discussed.

SC6 Text Analytics and Its Applications
Thu, Feb 23, 1:30 PM - 5:30 PM
City Terrace 7
Instructor(s): Edward Jones, Texas A&M University
Text analytics refers to the process of deriving actionable insights from text data. This half-day course explores the evolution and creative application of text analytics to solving business problems. Emphasis is placed on how text analytics is used for solving typical forecasting and classification problems by integrating structured and unstructured text data. Solutions are illustrated using SAS Text Miner, R and Python with real world applications in finance and social media.

Outline & Objectives

The course is organized into three main sections designed to cover topics ranging in degree of difficulty from the basic to the advanced:
1. Basics of Text Analytics & Techniques for Acquiring and Pre-Processing Text Data
2. Solving the Primary Text Analytics Problem - Topic Analysis
3. Integrating Structured and Unstructured Text Data

Participants with statistical programming experience will gain information on incorporating text analysis into their statistical analyses.

About the Instructor

Dr. Jones has a PhD degree Statistics from Virginia Tech and a B.S. in Computer Science from Texas A&M University - Commerce. He has over 10 years in the development of statistical and data mining software for companies in Silicon Valley and Rogue Wave Software. He designed and wrote the data mining software incorporated in IMSL, the International Mathematical and Statistics Library.?Currently he teaches data mining and analytics at Texas A&M University. He also consults with companies on business analytics and quality assurance, and is co-founder of Texas A&M Statistical Services.

Relevance to Conference Goals

Software Packages

SC7 Expressing Yourself with R
Thu, Feb 23, 1:30 PM - 5:30 PM
City Terrace 9
Instructor(s): Hadley Wickham, RStudio
In this mini-workshop you'll learn how to better express yourself in R. To express yourself clearly in R you need to know how to write high quality functions and how to use a little functional programming (FP) to solve common programming challenges. You'll learn:

* The three key properties of a function. * A proven strategy for writing new functions. * How to use functions to reduce duplication in your code. * How `lapply()` works and why it's so important. * A handful of FP tools that increase the clarity of your code.

This workshop is suitable for beginning and intermediate R users. You need to know the basics of R (like importing your data and executing basic instructions). If you're an advanced R user, you probably won't learning anything completely new, but you will learn techniques that allow you to solve new challenges with greater ease.

The workshop will be hands-on and interactive, so please make sure to bring along your laptop with R installed!

Outline & Objectives

In this mini-workshop you'll learn how to better express yourself in R. To express yourself clearly in R you need to know how to write high quality functions and how to use a little functional programming (FP) to solve common programming challenges.

About the Instructor

Hadley is Chief Scientist at RStudio and a member of the R Foundation. He builds tools (both computational and cognitive) that make data science easier, faster, and more fun. His work includes packages for data science (the tidyverse: ggplot2, dplyr, tidyr, purrr, readr, ...), and principled software development (roxygen2, testthat, devtools). He is also a writer, educator, and frequent speaker promoting the use of R for data science. Learn more on his website,

Relevance to Conference Goals

Modern data analysis must be performed on a computer, and if you're doing data analysis on a computer, it's worth the investment to learn a programming language. In this tutorial, you'll learn some useful tools in R that improve your ability to automate repeated parts of your analyses.

Software Packages

R + purrr package

SC8 Missing Data Analysis with R/SAS/Stata
Thu, Feb 23, 1:30 PM - 5:30 PM
City Terrace 12
Instructor(s): Din Chen, The University of North Carolina at Chapel Hill; Frank Liu, Merck Research Labs

Download Handouts
Missing data are near universal in applied research. Almost all applied researchers have faced the problems of missing data at some point. However, not all the researchers assessed missingness or used appropriate ways to deal with the missing data. Instead, researchers often drop the missing values (e.g., listwise deletion), which reduces the sample size, lowers statistical power, or use ad-hoc single imputation such as LOCF for simplicity. Both approaches introduce the possibility of biased parameter estimations. Such inefficient and potentially biased statistical inference would lead to erroneous research conclusions.

This short course aims to address the problems of missing data. The concept of different missing data mechanisms or typologies including missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) will be discussed with illustration from real clinical trial examples. Moreover, this short course will introduce how to conduct two commonly-used model-based methods for missing data analysis including the multiple imputation (Little & Rubin, 2002; Reiter & Raghunathan, 2007) and maximum likelihood (Allison, 2012) using R/SAS. Some sensitivity analysis approaches to handle missing data under MNAR will also be discussed briefly.

Outline & Objectives

This short course will:

a) review missing data issues and different missing data mechanisms (i.e., MCAR, MAR and NMAR);

b) introduce the multiple imputation and maximum likelihood methods in regression modeling and discuss the advantages and disadvantages of each method;

c) illustrate using R/SAS to analyze real data from clinical trial studies and compare the consistence from different software.

About the Instructor

Dr. Din Chen is a Fellow of ASA. He is now the Wallace H. Kuralt distinguished professor, Director of Consortium for Statistical Development and Consultation, School of Social Work and professor in biostatistics at the Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill. He was a professor in biostatistics at the University of Rochester and the Karl E. Peace endowed eminent scholar chair in biostatistics at Georgia Southern University. Professor Chen is also a senior statistics consultant for biopharmaceuticals and government agencies with extensive expertise in clinical trials and bioinformatics. He has more than 100 referred professional publications and co-authored 10 books in clinical trial methodology and public health applications. Professor Chen was honored with the "Award of Recognition" in 2014 by the Deming Conference Committee for highly successful biostatistics workshop tutorials.

His “Applied Meta-analysis” short course in 2016 at CSP was well received by his attendees.
Dr. Frank Liu is Distinguished Scientist, Biostatistics at Merck Research Laboratories. He has over 21 years of pharmaceutical industry working experiences on supporting multiple therapeutic areas including neuroscience, psychiatry, infectious disease, and vaccine products. His research of interests includes methods for longitudinal trials, missing data issues, safety data analysis, and design and analysis of non-inferiority trials. He has published more than 30 statistical papers/book chapters, and presented talks at professional meetings regularly. He has co-led a subteam for Bayesian missing data analysis in DIA Bayesian Working Groups, and co-taught short courses on “Sensitivity analysis using Bayesian and Imputation Approaches” at Deming conference 2015, and on “SAS Biopharmaceutical Applications” at Regulatory-Industry Workshop 2016. He has been leading working groups within Merck on developing several guidance documents on analysis of missing data.

Relevance to Conference Goals

1. To give an up-to-date development in missing data analysis so to guide the participants to learn the techniques about missing data imputation and maximum likelihood methods

2. To give an overview of R/SAS implementations for missing data analysis

3. To emphasize the applied aspects of how to deal with missing data with real examples and help attendees to solve their real-life problems from research and consulting as an applied statisticians and analysts

Software Packages


SC9 Bootstrap Methods and Permutation Tests
Thu, Feb 23, 1:30 PM - 5:30 PM
River Terrace 3
Instructor(s): Tim Hesterberg, Google

Download Handouts
We begin with a graphical approach to bootstrapping and permutation testing, illuminating basic statistical concepts of standard errors, confidence intervals, p-values and significance tests.

We consider a variety of statistics (mean, trimmed mean, regression, etc.), and a number of sampling situations (one-sample, two-sample, stratified, finite-population), stressing the common techniques that apply in these situations. We'll look at applications from a variety of fields, including telecommunications, finance, and biopharm.

These methods let us do confidence intervals and hypothesis tests when formulas are not available. This lets us do better statistics, e.g. use robust methods (we can use a median or trimmed mean instead of a mean, for example). They can help clients understand statistical variability. And some of the methods are more accurate than standard methods.

Outline & Objectives

Introduction to Bootstrapping
General procedure
Why does bootstrapping work?
Sampling distribution and bootstrap distribution

Bootstrap Distributions and Standard Errors
Distribution of the sample mean
Bootstrap distributions of other statistics
Simple confidence intervals
Two-sample applications

How Accurate Is a Bootstrap Distribution?

Bootstrap Confidence Intervals
Bootstrap percentiles as a check for standard intervals
More accurate bootstrap confidence intervals

Significance Testing Using Permutation Tests
Two-sample applications
Other settings

Wider variety of statistics
Variety of applications
Examples where things go wrong, and what to look for

Wider variety of sampling methods
Stratified sampling, hierarchical sampling
Finite population
Time series

Participants will learn how to use resampling methods:
* to compute standard errors,
* to check the accuracy of the usual Gaussian-based methods,
* to compute both quick and more accurate confidence intervals,
* for a variety of statistics and
* for a variety of sampling methods, and
* to perform significance tests in some settings.

About the Instructor

Dr. Tim Hesterberg is a Senior Statistician at Google. He previously worked at Insightful (S-PLUS), Franklin & Marshall College, and Pacific Gas & Electric Co. He received his Ph.D. in Statistics from Stanford University, under Brad Efron.

Hesterberg is author of the "Resample" package for R and primary author of the "S+Resample" package for bootstrapping, permutation tests, jackknife, and other resampling procedures, is co-author of Chihara and Hesterberg "Mathematical Statistics with Resampling and R" (2011), and is lead author of "Bootstrap Methods and Permutation Tests" (2010), W. H. Freeman, ISBN 0-7167-5726-5, and technical articles on resampling. See

Hesterberg is on the executive boards of the National Institute of Statistical Sciences and the Interface Foundation of North America (Interface between Computing Science and Statistics).

He teaches kids to make water bottle rockets, leads groups of high school students to set up computer labs abroad, and actively fights climate chaos.

Relevance to Conference Goals

Resampling methods are important in statistical practice, but have been omitted or poorly covered in may old-style statistics courses. These methods are an important part of the toolbox of any practicing statistician.

It is important when using these methods to have some understanding of the ideas behind these methods, to understand when they should or should not be used.

They are not a panacea. People tend to think of bootstrapping in small samples, when they don't trust the central limit theorem. However, the common combinations of nonparametric bootstrap and percentile intervals is actually accurate than t procedures. We discuss why, remedies, and better procedures that are only slightly more complicated.

These tools also show how poor common rules of thumb are -- in particular, n >= 30 is woefully inadequate for judging whether t procedures should be OK.

Software Packages

I mention the R resample and boot packages, but this is not a focus of the course.