Add-Ons

JSM sessions which require ticket purchase have limited availability and therefore are subject to sell-out or cancellation. Below are the functions which still have availability. Although this list is updated in real-time, please bear in mind that tickets are sold online around the clock; if you plan to purchase a function ticket onsite and see the function on this list before you travel to JSM, we cannot guarantee it will still be available for purchase when you arrive at JSM. To find out how many tickets remain for a particular function, please contact the ASA at (703) 684-1221.

Available Add-Ons


Continuing Education and Computer Technology Workshops

CE_01C CE: (TWO-DAY COURSE) Statistical Analysis with Missing Data
INSTRUCTOR(S): Roderick Little and Trivellore Raghunathan

This short course will discuss methods for the statistical analysis of data sets with missing values. Topics will include the definition of missing data; assumptions about mechanisms, including missing at random; pros and cons of simple methods such as complete-case analysis, naïve imputation, etc.; weighting methods; multiple imputation; maximum likelihood and Bayesian inference with missing data; computational techniques, including EM algorithm and extensions and Gibbs sampler; software for handling missing data; missing data in common statistical applications, including regression, repeated-measures analysis, and clinical trials; and selection and pattern-mixture models for nonrandom nonresponse. The course requires knowledge of standard statistical models such as the multivariate normal, multiple linear regression, and contingency tables, as well as matrix algebra, calculus, and basic maximum likelihood for common distributions. Recommended text: Little, R.J., and Rubin, D.B. (2002), Statistical Analysis with Missing Data, 2nd edition, Wiley.


CE_02C CE: Applied Longitudinal Analysis
INSTRUCTOR(S): Garrett Fitzmaurice

**Excellence-in-CE Award winner.**
The goal of this course is to provide a broad introduction to statistical methods for analyzing longitudinal data. The emphasis is on the practical aspects of longitudinal analysis. I begin the course with a review of established methods for longitudinal data analysis when the response of interest is continuous and present a general introduction to linear mixed effects models for continuous responses. Next, I discuss how smoothing and semiparametric regression allow greater flexibility for the form of the relationship between the mean response and covariates. I demonstrate how the mixed model representation of penalized splines makes this extension straightforward. When the response of interest is categorical (e.g., binary or count data), two main extensions of generalized linear models to longitudinal data have been proposed: “marginal models” and "generalized linear mixed models." While both classes account for the within-subject correlation among the repeated measures, they differ in approach. We will highlight the main distinctions between these two types of models and discuss the types of scientific questions addressed by each.


CE_03C CE: The Art and Science of Data Visualization Using R
INSTRUCTOR(S): Abel Rodriguez

"A picture is worth a thousand words" is a well known adage that explains why data visualization has been extremely popular in an age of cheap computers and large amounts of complex, high dimensional data. Visualizations such as histograms, boxplots and scatterplots are a key component of exploratory data analysis, which is taught on every introductory statistics course. However, modern data visualization goes well beyond these relatively simple tools that all statisticians are accustomed to. Furthermore, creating really effective visualizations for complex data requires knowledge of basic notions in cognitive psychology and graphic design that are not a standard part of most statistics or machine learning programs.

The aim of this short course is to introduce participants to concepts such as pre-attentive processing, the hierarchy of visual cues and color perception, and how they can be used to create more effective visualizations. Although the concepts we discuss are generally applicable and software-independent, we demonstrate them using R, and provide numerous examples of code that participants will be able to adapt for their own purposes. Furthermore, the course is built around a large number of examples and case studies that illustrate both best and worst practices in visualization design.


CE_04C CE: Statistical Issues in Online Experimentation
INSTRUCTOR(S): Roger Longbotham and Alex Deng

Large websites such as Google, Amazon, Facebook, Yahoo!, Bing, and many others run tens of thousands of statistically valid experiments every year to test changes to their sites. Medium-sized and smaller sites are also learning the importance of testing changes to their sites. We will discuss the statistical and technical issues specific to running experiments in this space and present some open issues for researchers. Some of the topics we will cover are: application of standard statistical experimentation principles to online testing; parametric and nonparametric methods; Bayesian analysis; testing for interactions among concurrent experiments; multi-arm bandit testing and adaptive traffic allocation; A/A and diagnostic tests; common experimentation platforms for conducting experiments; best practices for online experimentation; common traps and pitfalls to avoid.

At the conclusion of this course, attendees will have the statistical information they need to successfully conduct valid experiments on the web. We will also give researchers open problems to be solved.


CE_05C CE: Statistical Analysis of Financial Data with R
INSTRUCTOR(S): David S. Matteson and David Ruppert

The analysis of financial and econometric data is typified by non-Gaussian multivariate observations which exhibit complex dependencies: heavy-tailed and skewed marginal distributions are commonly encountered; serial dependence, such as auto-correlation and conditional heteroscedasticity, appear in time-ordered sequences; and non-linear, higher-order, and tail dependence are widespread.

This course will introduce statistical methods for the analysis of financial data. Examples and case studies will illustrate the application of these methods using the freely available software language R and numerous contributed packages.

The first half of the course will include: assessing departures from normality; modeling univariate and multivariate data; copula models and tail dependence. The second half will provide an introduction to univariate and multivariate time series modeling including: Autoregressive Moving Average (ARMA), Generalized Autoregressive Conditional Heteroscedastic (GARCH), and Stochastic Volatility (SV) models.

The prerequisites are knowledge of calculus, vectors and matrices; probability models, mathematical statistics, and regression at the level typical of third- or fourth-year undergraduates in statistics, mathematics, engineering and related disciplines. Prior experience using R is helpful, but not necessary.


CE_06C CE: (HALF-DAY COURSE) Integrative Analytics of Different Types of Genetic and Genomic Data Using Causal Mediation Modeling
COSPONSOR: Section on Statistics in Genomics and Genetics

INSTRUCTOR(S): Xihong Lin and Yen-Tsung Huang

With the wide availability of genetic and genomic data, single-platform analyses such as data from genome-wide association studies and differentially expressed microarray studies have become fairly standard in biomedical research. They have identified numerous disease susceptibility genetic loci and gene signatures. Despite these successes, it is of increasing scientific interest to integrate multiplatform genetic and genomic data to investigate how different genomic features work together to affect a phenotypic trait and disease. Mediation analysis provides an attractive framework to investigate such a biological mechanism. This course provides an introduction to the causal mediation model and an overview of recent developments in integrative genomic analytics using this framework. Topics include a review of basic genomic biology, causal mediation modeling and its assumptions, framing integrative genomics as a mediation problem, statistical methods for integrating multiple genomic data, estimation and hypothesis testing for direction, indirect and total effects, statistical methods for mediation analysis of family genomic studies, integrated analyses for separate studies, and statistical methods for handling multiple mediators. Data examples will be provided and software will be discussed. Prerequisites are beginning graduate or senior undergraduate coursework in applied and mathematical statistics. Prior knowledge of biology or genetics is not required, but helpful.


CE_07C CE: Advanced R
INSTRUCTOR(S): Hadley Wickham

This class is a good fit for persons who have some experience programming in R already. You should have written a number of functions and be comfortable with R’s basic data structures (vectors, matrices, arrays, lists, and data frames). You will find the course particularly useful if you’re an experienced R user looking to take the next step, or if you’re moving to R from other programming languages and you want to quickly get up to speed with R’s unique features. The course will give you a solid grounding in R programming techniques. We’ll start by reinforcing the foundations of your R knowledge, and then go on to cover the three main paradigms of R programming: functional programming, object-oriented programming, and metaprogramming.

Bring a laptop, the latest version of R, and a recent version of the RStudio IDE. You’ll download an (electronic) copy of all slides, code, and data during the class.


CE_08C CE: Adaptive Methods for Modern Clinical Trials
COSPONSOR: Biopharmaceutical Section

INSTRUCTOR(S): Byron Jones, Frank Bretz, and Guosheng Yin

Clinical trials play a critical role in pharmaceutical drug development. New trial designs often depend on historical data, which may not be accurate for the current study due to changes in study populations, patient heterogeneity, or medical facilities. As a result, the original plan and study design may need to be adjusted, or even altered, to accommodate new findings and unexpected interim results. The goal of using adaptive methods in clinical trials is to enhance the flexibility of trial conduct and maintain the integrity of trial findings. Through carefully thought-out and planned adaptation, the right dose can be identified faster, patients can be treated more effectively, and treatment effects can be evaluated more efficiently. The net result makes for a more expeditious drug-development process. From the perspective of practicality, this one-day short course will introduce various adaptive methods for phase I to phase III clinical trials. Accordingly, different types of adaptive designs will be introduced and illustrated with case studies. This includes dose escalation/de-escalation and dose insertion based on observed data; adaptive dose-finding studies using optimal designs to allocate new cohorts of patients based on the accumulated evidence; population-enrichment designs; early stopping for toxicity, futility, or efficacy using group-sequential designs; blinded and unblinded sample size re-estimation; and adaptive designs for confirmatory trials with treatment or population selection at interim.


CE_09C CE: Analysis of Categorical Data
COSPONSOR: Biometrics Section

INSTRUCTOR(S): Christopher Bilder and Thomas Loughin

We live in a categorical world! From a positive or negative disease diagnosis to choosing all items that apply in a survey, outcomes are frequently organized into categories so people can more easily make sense of them. In this course, participants will learn how to analyze the most common types of categorical data. The course is divided into four main sections. The first three sections are organized by response type: 1) binary/binomial, 2) multicategory, and 3) count. Within each section, we will examine how to estimate and interpret appropriate models while giving practical advice on their use. The fourth section applies model selection and evaluation methods to those models discussed in the first three. Focus will be on variable selection, evaluation of model fit, and solutions to overdispersion. The ideal background for participants is experience with multiple linear regression and the application of likelihood-based methods (particularly Wald and likelihood-ratio methods). All computations will be performed using R. Familiarity with the basics of R, including object types and the use of functions, is recommended.


CE_10C CE: Practical Bayesian Computation
COSPONSOR: Section for Statistical Programmers and Analysts

INSTRUCTOR(S): Fang Chen

This one-day course reviews the basic concepts of Bayesian inference and focuses on the practical use of Bayesian computational methods. The objectives are to familiarize statistical programmers and practitioners with the essentials of Bayesian computing and equip them with computational tools through a series of worked-out examples that demonstrate sound practices for a variety of statistical models and Bayesian concepts. The first part of the course will review differences between classical and Bayesian approaches to inference, fundamentals of prior distributions, and concepts in estimation. The course also will cover MCMC methods and related simulation techniques, emphasizing the interpretation of convergence diagnostics in practice. The rest of the course will take a topic-driven approach that introduces Bayesian simulation and analysis and illustrates the Bayesian treatment of a wide range of statistical models using software with code explained in detail. The course will present major application areas and case studies, including multi-level hierarchical models, multivariate analysis, nonlinear models, meta-analysis, latent variable models, and survival models. Special topics include Monte Carlo simulation, sensitivity analysis, missing data, model assessment and selection, variable subset selection, and prediction. Examples will be done using SAS (PROC MCMC), with a strong focus on technical details. Attendees should have a background equivalent to an MS in applied statistics. Previous exposure to Bayesian methods is useful, but not required. Familiarity with material at the level of this text book is appropriate: Probability and Statistics (Addison Wesley), DeGroot and Schervish.


CE_11C CE: (HALF-DAY COURSE) Functional Data Analysis—Methods and Computing
INSTRUCTOR(S): Giles Hooker

Functional Data Analysis (FDA) is a class of statistical methods that apply to repeated, complex processes. Classical examples include motion capture data in which subjects repeat the same action several times. These actions are recorded with very high frequency and accuracy, but will differ from repeat to repeat and subject to subject. Because the process is complicated, they are modeled as nonparametric functions of time; methods in FDA are used to describe variation between curves and relationships between these curves and other quantities. While motion capture data serves as a useful motivation, FDA can be applied in a variety of applications and does not require precise or high-frequency measurements of every curve. This course will introduce participants to the statistical methods of functional data analysis modeling and computational tools to carry them out. We will briefly review techniques for nonparametric smoothing to represent individual functions before developing methods to describe distributions of functions and variation between them. The course will examine extensions of linear regression, generalized linear models, and generalized additive models to the cases where functional data serve either as covariates or a response. We also will describe methods in two areas unique to FDA: curve alignment—in which we try to match the timing of features between different curves—and dynamics—in which a derivative of the function, or relationships between derivatives, serve as the relevant question of interest. The course will provide example code that makes heavy use of the fda package in R; further software resources that replicate or extend this functionality will be cited.


CE_12C CE: (HALF-DAY COURSE) Concepts and Implementation of Bayesian Adaptive Phase I Oncology Trials
COSPONSOR: Biopharmaceutical Section

INSTRUCTOR(S): Satrajit Roychoudhury and Beat Neuenschwander

Phase I trials in oncology are usually small adaptive dose-escalation trials. The aim is to approximately understand the dose-toxicity profile of a drug, and, eventually, to find a reasonably safe dose for future testing. Much statistical research for phase I trials has accumulated over the past 25 years, with modest impact on statistical practice. The vast majority of trials still follow the 3+3 design, despite it often missing the targeted dose (poor operating characteristics) and failing to provide a real understanding of true toxicity rates (no statistical inference). We present a comprehensive and principled statistical approach. The implementation is Bayesian, with the following main parts: a parsimonious model for the dose-toxicity relationship, the possibility to incorporate contextual information (“historical data”) via priors, and safety-centric metrics (overdose probabilities) that inform dose adaptations under appropriate overdose control. After basic clinical and statistical considerations, we introduce the statistical methodology for the single-agent setting, and then extend it to dual and triple combinations. Applications and a discussion about implementation (such as basic WinBUGS code) issues complement this training and provide practical insights into phase I trials.


CE_13C CE: Joint Modeling of Longitudinal and Survival Data
COSPONSOR: Section on Statistics in Epidemiology

INSTRUCTOR(S): Joseph Ibrahim

We will examine in-depth statistical methods for joint modeling of longitudinal and survival data. Both frequentist and Bayesian approaches will be examined. The types of joint models to be discussed are selection models, pattern mixture models, and shared parameter models. We will discuss both linear mixed models and generalized linear mixed models for the longitudinal models and Cox-type, piecewise constant hazard, as well as cure rate models for the survival component. Both univariate and multivariate survival models will be discussed, as well as multivariate longitudinal models. Several types of applications also will be discussed, including ones in cancer, vaccine studies, quality-of-life studies, and AIDS research. Missing data issues also will be examined. SAS, winBUGS, and R software for fitting joint models will be illustrated in detail.


CE_14C CE: Analysis of Clinical Trials: Theory and Applications
COSPONSOR: Biopharmaceutical Section

INSTRUCTOR(S): Alex Dmitrienko, Devan Mehrotra, and Jeff Maca

This course covers six important topics that commonly face statisticians and research scientists conducting clinical research: analysis of stratified trials, analysis of longitudinal data with dropouts and potential outliers, analysis of time-to-event data (with emphasis on small trials), crossover trials, multiple comparisons, and interim decisionmaking and adaptive designs. Offering a well-balanced mix of theory and applications, it presents practical advice from experts and discusses regulatory considerations. The discussed statistical methods will be implemented using SAS and R software. Clinical trial examples will be used to illustrate the statistical methods. The course is designed for statisticians working in the pharmaceutical or biotechnology industries, as well as contract research organizations. It is equally beneficial to statisticians working in institutions that deliver health care and government branches that conduct health care--related research. Attendees are required to have basic knowledge of clinical trials. Familiarity with drug development is highly desirable, but not necessary.

This course was taught at JSM 2005--2014 and received the Excellence in Continuing Education Award in 2005.


CE_15C CE: Classification and Regression Trees and Forests
INSTRUCTOR(S): Wei-Yin Loh

It is more than 50 and 30 years since AID (Morgan and Sonquist 1963) and CART (Breiman et al 1984) appeared. Rapidly increasing use of trees among practitioners has led to great advances in algorithmic research over the last two decades. Modern tree models have higher prediction accuracy and do not have selection bias. They can fit linear models in the nodes using GLM, quantile, and other loss functions; response variables may be multivariate, longitudinal, or censored; and classification trees can employ linear splits and fit kernel and nearest-neighbor node models. This course begins with examples to compare tree and traditional models. Then it reviews the major algorithms, including AID, CART, C4.5, CHAID, CRUISE, CTREE, GUIDE, M5, MOB, and QUEST. Real data are used to illustrate the features of each, and results on prediction accuracy and model complexity versus forests and some machine learning methods are presented. Examples are drawn from business, science, and industry and include applications to subgroup identification for personalized medicine, missing value imputation in surveys, and differential item functioning in educational testing. Relevant software is mentioned where appropriate. Attendees should be familiar with multivariate analysis at the level of Johnson and Wichern’s "Applied Multivariate Statistical Analysis."


CE_16C CE: Guidelines for Using State-of-the-Art Methods to Estimate Propensity Score and Inverse Probability of Treatment Weights When Drawing Causal Inferences
INSTRUCTOR(S): Lane Burgette and Daniel F. McCaffrey

Estimation of causal effects is a primary activity of many studies. Examples include testing whether a substance abuse treatment program is effective, whether an intervention improves the quality of mental health care, or whether incentives improve retention of military service members. Controlled experiments are the gold standard for estimating such effects. However, experiments are often infeasible, forcing analysts to rely on observational data in which treatment assignments are out of the control of the researchers. This short course will provide an introduction to causal modeling using the potential outcomes framework and the use of propensity scores and weighting (i.e., propensity score or inverse probability of treatment weights) to estimate causal effects from observational data. It also will present step-by-step guidelines for estimating and performing diagnostic checks of the estimated weights for testing the relative effectiveness of two or more interventions and the cumulative effects of time-varying interventions. Attendees will gain hands-on experience estimating propensity score weights using boosted models and covariate balancing propensity scores in R, SAS and Stata; evaluating the quality of those weights; and using them to estimate intervention effects. Attendees should be familiar with linear and logistic regression; no knowledge of propensity scores is expected.


CE_17C CE: Applied Text Analytics
INSTRUCTOR(S): James Wisnowski

The explosion in sensors in the internet of things has led to a dramatic increase in data volume in the past few years. A disproportionate amount of this is unstructured data such as texts, voice recording, and images. While enterprise data may be analyzed in classic row by column format, much of the unstructured data remain unexplored in most organizations. This short course will provide an overview of new, easily implemented methods to find previously unknown relationships from a collection of text documents. Data mining techniques are also explored with text from sources such as tweets, voice-to-text translations, email, survey comments, incident reports, free-form data fields, websites, research reports, blogs, and other social media to discover potentially useful and actionable business insights. We will provide demonstrations using data sets with applications to financial services, aerospace/defense, medical, and other industries representative of ASA researchers. This will be a hands-on workshop in which participants are provided R code and packages to immediately implement text mining methods and discover meaningful structure from text fields. We will go through end-to-end examples, starting from assembling disparate text sources, followed by creating a structured database with the document term matrix, then reducing the dimensionality of the problem with a rank-reduced singular value decomposition, and concluding by applying data mining methods such as decision trees, regression, and cluster analysis to discover useful relationships to integrate into standard structured data. While relevant theory will be addressed, the focus of the course will be on giving participants an appreciation for the practical application of text mining to real-world applications. We will focus on R and demonstrate the use of SAS TextMiner, along with an integration of R into a common statistical analysis package to allow for rapid discovery.


CE_18C CE: (HALF-DAY COURSE) Bayesian Structural Time Series
COSPONSOR: Section on Statistical Computing

INSTRUCTOR(S): Steven Scott

Structural time series models are a natural, practical, and useful alternative to classic Box-Jenkins ARIMA models. Because stuctural time series are defined through a set of latent variables, they have a natural Bayesian interpretation. This course introduces the basic ideas of structural time series (i.e., decomponsing the model into interpretable components of state) and the fundamental tools for computing with them (mainly the Kalman filter). In the modern Big Data computing environment, it is helpful to think about time series models that contain a regression component, so a target time series can be predicted based on other series whose values are known in advance. Examples of where this technique is useful include causal modeling, economic time series released (and revised) with a lag, and handling calendar effects by including them as regression components with deterministic predictors. The number of potential predictor series can be quite large, so it is natural to consider Bayesian spike and slab priors for introducing model sparsity. All concepts taught in this course have been implemented in the "bsts" R package, which is freely downloadable from CRAN under the GNU public license.


CE_19C CE: (HALF-DAY COURSE) Quantile Regression in Practice
INSTRUCTOR(S): Yonggang Yao

Quantile regression is a modern statistical approach for modeling the quantiles of a response variable conditional on explanatory covariates. Compared with ordinary least squares linear regression (which models the conditional mean), quantile regression enables you to more fully explore your data by modeling a set of conditional quantiles, such as the median and 5th and 95th percentiles. Quantile regression is particularly useful when your data are heterogeneous, or when you cannot assume a parametric distribution for the response. Quantile process regression fits quantile regression models for the entire range of quantile levels in [0,1] and enables you to estimate the conditional distribution of a response variable. Applications include risk analysis, conditional ranking, and sample selection. This tutorial provides an overview of the theoretical concepts of quantile regression and emphasizes its practical benefits as both a regression method and distribution estimation method. This tutorial uses a variety of examples to illustrate the following topics: 1. motivation for and basic concepts of quantile regression, 2. comparison of quantile regression with linear regression, 3. inference with quantile regression, 4. quantile process regression, and 5. model selection for quantile regression. Participants are assumed to be familiar with linear algebra and linear regression. Computations are done with SAS.


CE_20C CE: (HALF-DAY COURSE) Dynamic Treatment Regimes, Sequentially Randomized Trials, and Causal Inference
COSPONSOR: Biometrics Section

INSTRUCTOR(S): Erica Moodie and Bibhas Chakraborty

Effective treatment of many chronic disorders such as mental illnesses, cancer, and HIV infection typically requires ongoing medical intervention in which clinicians sequentially make therapeutic decisions, adapting the type, dosage, and timing of treatment according to patient characteristics. Dynamic treatment regimes (DTRs) operationalize the sequential decisionmaking process in the personalized clinical practice. Constructing evidence-based DTRs from either observational or sequentially randomized trials comprises an important and challenging methodological area of statistical research. This half-day course will provide a comprehensive description of the field. We will begin with a discussion of relevant data sources (multi-stage sequentially randomized trials and longitudinal observational studies) and their relative advantages, as well as considerations for designing efficient studies that can produce high-quality data to aid the construction of DTRs. We will then turn our attention to estimation via a popular method called Q-learning. Next, we will consider inferential challenges in this area, and present some state-of-the-art methods for doing inference. We will cover a practical demonstration of estimation of optimal DTRs using Q-learning and associated inference, applying the R package qLearn. Finally, we will give a relatively quick overview of alternative estimation approaches.


CE_21C CE: Introduction to Statistical Learning for Unsupervised Problems
COSPONSOR: Section on Statistical Learning and Data Mining

INSTRUCTOR(S): Ali Shojaie

This course will provide a practical introduction to statistical learning methods for unsupervised problems. We will discuss three classes of methods: cluster analysis, dimension reduction, and graphical modeling. Specifically, we will first discuss hierarchical and K-means clustering methods. We will then discuss principal component analysis and multidimensional scaling as tools for reducing the ambient dimension of the data. Finally, we will discuss sparse graphical models for analysis of high-dimensional data, including data from Gaussian and non-Gaussian distributions. Throughout, we will emphasize practical applications of these methods and their limitations in high-dimensional settings, including validation of results of unsupervised learning methods and tools for reproducible research. We will discuss a number of case studies from finance and biology to describe various statistical learning methods. The course will incorporate material from "Elements of Statistical Learning" by Hastie et al, "Introduction to Statistical Learning" by James et al, and instructor’s notes from two courses taught at the Summer Institute for Statistical Genetics (SISG).


CE_22C CE: Managing Statistical Consulting Projects: Lessons from the Front
COSPONSOR: Section on Statistical Consulting

INSTRUCTOR(S): Michael Greene and David Steier

Beyond having good data and technology, what does it take to deliver analytic success on the ground? Using a framework for managing analytics projects developed from our consulting experience, this tutorial aims to offer statistical professionals practical guidance on the process of evaluating, initiating, and delivering statistical consulting projects. It will draw on case studies from the following topics: Starting an analytics conversation - framing the conversation on analytics to improve decisionmaking and drive value; evaluating the current state - prioritizing effort based on systematic assessment against a data and analytics maturity model; planning and organizing analytics projects and programs - analytics project planning and organizational models for multi-disciplinary teams; choosing data, analytics techniques, and enabling technologies - using problem constraints to drive architectural and algorithmic choices; managing delivery of analytics projects and programs - iterative prototyping and communication with stakeholders; and what to watch for - common warning signs and opportunities. The intended audience is anyone undertaking a statistical consulting project starting from the ground up. Prior analytic project management experience is not required, but attendees are encouraged to bring their own case studies and topics for discussion.


CE_23C CE: Software Engineering for Statisticians
COSPONSOR: Section on Bayesian Statistical Science

INSTRUCTOR(S): Murray Stokely

Statisticians are increasingly being employed alongside software engineers to make sense of the large amounts of data collected in modern e-commerce, Internet, retail, and advertising companies. This course introduces a number of best practices in writing statistical software taught to computer scientists, but which is seldom part of a statistics degree. Revision control tools, unit testing, code modularity, structure, readability, and the basics of computer architecture and performance will be covered. A few examples of real R code written in a commercial environment will be shared and discussed to illustrate some of the problems of moving from working alone or in a small group in an academic setting into a team in a large commercial setting (the course is mostly language agnostic, but R will be used in some examples).


CE_24C CE: Statistical Methods for Ranking Data
INSTRUCTOR(S): Mayer Alvo and Philip Yu

Ranking data commonly arise when ranking a set of individuals or objects in accordance with some criterion. Such data may be observed directly, or it may come from a ranking of a set of assigned scores. Alternatively, ranking data may arise when transforming continuous or discrete data in a nonparametric analysis. Examples of ranking data may be found in politics, voting and elections, market research, psychology, health economics, food tasting, and even horse racing. Many statistical methods have been developed in the recent decades for analyzing and modeling ranking data. These methods are by their nature nonparametric and consequently require no underlying assumptions on the distributions of the observed scores. In this course, participants will learn how ranking data can be analyzed for drawing inferences and how it can be modeled. Methods of handling missing data, incomplete rankings, and ties will be introduced. Most of these methods will be illustrated by application to real data sets. We will have computer demonstration of using a number of R packages, including StatMethRank, a companion R package to our book titled Statistical Methods for Ranking Data, Springer.


CE_25C CE: (HALF-DAY COURSE) Introduction to Structural Equation Modeling and Its Applications
INSTRUCTOR(S): Yiu-Fai Yung

Structural equation modeling (SEM), which originated in the social sciences, is becoming popular in other research fields such as education, health science, and medicine. This course introduces statisticians to the general methodology of SEM and its applications. The course reviews traditional topics, including path analysis, confirmatory factor analysis, measurement error models, and linear structural relations with latent variables. Applications are illustrated with examples drawn from educational, behavioral, and marketing research. Two advanced SEM techniques, the analysis of total and indirect effects, and multiple-group SEM also are covered. The statistical theory of SEM is presented at a level suitable for general understanding of sound practice. This course is designed for statisticians and data analysts who want an introductory overview of SEM techniques for applications. Attendees will learn how to (1) specify structural equation models with or without latent variables, (2) interpret model fit statistics and estimation results, (3) estimate total and indirect effects, and (4) use model modification indices to fine-tune models. The CALIS procedure in SAS/STAT software is used to demonstrate model specifications and fitting. Attendees should have a basic understanding of regression analysis. Experience using SAS software is not required.


CE_26C CE: (HALF-DAY COURSE) Meta-Analysis: Combining the Results of Multiple Studies
COSPONSOR: Health Policy Statistics Section

INSTRUCTOR(S): Christopher Schmid and Ingram Olkin

Meta-analysis enables researchers to synthesize the results of multiple studies designed to determine the effect of a treatment, device, or test. The information explosion and movement toward the requirement of evidence to support policy decisions has promoted the use of meta-analysis in all scientific disciplines. Statisticians play a major role in meta-analysis because analyzing data with few studies and many variables is difficult. In this workshop, we introduce the major principles and techniques of statistical analysis of meta-analytic data. Examples of published meta-analyses in medicine and the social sciences will be used to illustrate the various methods.


CE_27T CTW: Introducing the SAS® BCHOICE Procedure for Bayesian Choice Models
INSTRUCTOR(S): Amy Shi

Discrete choice models (DCMs) are widely popular in marketing research and related areas, where it is important to model the underlying process a consumer uses to choose products or reach decisions when faced with multiple alternatives. The rising popularity of DCMs in the past decades coincides with the growing presence in this area of Bayesian approaches, which offer modeling and computational conveniences that are otherwise difficult to obtain. This tutorial introduces the BCHOICE procedure in SAS/STAT® 13.1, which is designed to perform Bayesian analysis for discrete choice models. The BCHOICE procedure supports all three major choice models: logit, nested logit, and probit. Models can be extended to include random effects to estimate individual-level parameters, better enabling you to infer heterogeneity in product preferences and price sensitivity. The BCHOICE procedure obtains samples from posterior distributions and produces summary and diagnostic statistics. It provides a CLASS statement to handle categorical variables and uses parallel processing to ensure fast sampling. This tutorial illustrates important features of the BCHOICE procedure and shows how to use it for estimation, inference, and prediction through various examples.


CE_28T CTW: Introduction to Data Mining with CART Classification and Regression Trees
INSTRUCTOR(S): Kaitlin Onthank and Ling Chen

This Tutorial is intended for the applied statistician wanting to understand and apply the CART classification and regression trees methodology. The emphasis will be on practical data analysis and data mining involving classification and regression. All concepts will be illustrated using real-world, step-by-step examples. The course will begin with an intuitive introduction to tree-structured analysis: what it is, why it works, why it is nonparametric and model-free, and advantages in handling all types of data, including missing values and categorical. Working through examples, we will review how to read the CART Tree output and how to set up basic analyses. This session will include performance evaluation of CART trees and cover ways to search for possible improvements of the results. Once a basic working knowledge of CART has been mastered, the tutorial will focus on critical details for advanced CART applications, including choice of splitting criteria, choosing the best split, using prior probabilities to shape results, refining results with differential misclassification costs, the meaning of cross validation, tree growing, and tree pruning. The course will conclude with discussion about the comparative performance of CART versus other computer-intensive methods such as artificial neural networks and statistician-generated parametric models

All attendees will receive six months access to fully functional versions of the the SPM Salford Predictive Modeler software suite.


CE_29T CTW: Predicting the Future Course of a Trial
INSTRUCTOR(S): Cyrus Mehta

Cytel president and co-founder, Cyrus Mehta, will explore new ways of predicting a trial’s future course based on interim data analysis and simulating the time path of a trial. Predictive interval plots (PIPs) are a series of repeated confidence intervals (RCIs) graphically depicting the future course of a group sequential clinical trial conditional on the current data. Sorted and stacked on top of one another, RCI’s provide insights about the magnitude of the expected treatment effect at 1) each subsequent interim look and 2) the end of the trial. (2007. Evans, Li, and Wei. Data monitoring in clinical trials. DIA Journal 41:733-42.) Data monitoring committees find PIPs especially useful when contemplating early termination of a trial for either efficacy or futility based on totality of available evidence. Other new methods introduced are for predicting enrollment and event forecasting in time-to-event trials. East® PREDICT aggregates data from individual sites, models the arrival of patients and events, updates model parameters as new data become available, and simulates the time path (patient enrollment/event arrival) of the trial.


CE_30T CTW: Enter a Data Science Competition. You Don’t Need to Be an Expert!
INSTRUCTOR(S): Kaitlin Onthank and Ling Chen

How would you like to enter a data science competition like Kaggle or the KDD Cup? In this presentation designed for statisticians, we will show how you can quickly and easily create a model to achieve a top-performing result. We will demonstrate with step-by-step instructions and using multiple data sets from past competitions. At the end of this workshop, our goal is that you will be able to build TreeNet gradient boosting models that bring you within decimal places of winning solutions. Use this information as a starting point for future Kaggle competitions and KDD Cups.

All attendees will receive six months access to fully functional versions of the the SPM Salford Predictive Modeler software suite.


CE_31T CTW: Analyzing Item Responses with the IRT Procedure: An Introduction with Applications
INSTRUCTOR(S): Xinming An

Item response theory (IRT) is concerned with efficient scale development and accurate subject scoring. You design scale items to measure various abilities (e.g., math ability), traits (e.g., extraversion), or behavioral characteristics (e.g., purchasing tendency). Responses to scale items can be binary (e.g., correct or incorrect responses in ability tests) or ordinal (e.g., degree of agreement on Likert scales). Traditionally, IRT models have been applied successfully to analyze these types of data in psychological assessments and educational testing. More recently, IRT models have become popular in fields such as quality-of-life research, patient-report-outcome research, and marketing research. This workshop starts with an overview of IRT models such as Rasch, 1PL, and graded response, and then demonstrates their applications with real-data examples. This workshop shows how to use the newly developed IRT procedure (SAS/STAT® 13.1 or later) to calibrate items, interpret item characteristics, and score subjects. Finally, this workshop explains how the applications of IRT models can help develop better scales and improve subject scoring.


CE_32T CTW: Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
INSTRUCTOR(S): Kaitlin Onthank and Ling Chen

How would you like to use data mining in addition to classical statistical modeling? In this presentation designed for statisticians, we will show how you can quickly and easily create data mining models. We will demonstrate with step-by-step instructions. We will use real-world data mining examples drawn from online advertising and the financial services industries. At the end of this workshop, our goal is that you will be able to build your own data mining models on your own data sets. Data mining is a powerful extension to classical statistical analysis. As opposed to classical techniques, data mining easily finds patterns in data, nonlinear relationships, key predictors, and variable interactions that are difficult—if not impossible—to detect using standard approaches. This tutorial follows a step-step approach to introduce advanced automation technology, including CART, MARS, TreeNet Gradient Boosting, Random Forests, and the latest multi-tree boosting and bagging methodologies by the original creators of CART (Breiman, Friedman, Olshen, and Stone).

All attendees will receive six months access to fully functional versions of the the SPM Salford Predictive Modeler software suite.


CE_33T CTW: Modern Dose Escalation Trial Designs for Oncology in East®
INSTRUCTOR(S): Lingyun Liu

Cytel has developed new, industry-standard tools to design, simulate, and operationally support Phase 1 dose escalation trials to determine maximum-tolerated dose (MTD). In addition to the traditional 3+3 design, East® ESCALATE includes Bayesian adaptive approaches, including the modified toxicity probability interval (mTPI) method, the continual reassessment method (CRM), and the Bayesian logistic regression model (BLRM). We will begin by reviewing the underlying theories, and then apply the optimal statistical methods in the process of designing an actual trial.


CE_34T CTW: Practical Finite Mixture Modeling with SAS
INSTRUCTOR(S): Dave Kessler

Many situations call for a finite mixture model. You might want to use such a model when unobserved covariates influence the response, when multiple processes generate the data, or when the response simply has an unusual distribution. This workshop introduces finite mixture models and the SAS/STAT procedure for fitting them, PROC FMM. You will learn the basic form of the finite mixture model and how to use PROC FMM to fit zero-inflated Poisson models, hurdle models, overdispersion models for multinomial data, and other mixture models. These applications will be demonstrated through numerous examples and discussion of the motivation for different approaches. This workshop will show you how to identify situations that call for a finite mixture model, interpret the results of the analysis, and use these results to make better decisions. Familiarity with generalized linear models is helpful.


CE_35T CTW: Evolution of Classification: From Logistic Regression and Decision Trees to Bagging/Boosting and Netlift Modeling
INSTRUCTOR(S): Kaitlin Onthank and Ling Chen

Not so long ago, modelers would use traditional classification, data mining, and decision tree techniques to identify a target population. We have come a long way in recent years. By incorporating modern approaches—including boosting, bagging, and netlift—there has been a giant leap in this arena. We will discuss recent improvements to conventional decision tree and logistic regression technology via two case study examples: one in direct marketing and the second in biomedical data analysis. Within the context of real-world examples, we will illustrate the evolution of classification by contrasting and comparing regularized logistic regression, CART, random forests, TreeNet stochastic gradient boosting, and RuleLearner.

All attendees will receive six months access to fully functional versions of the the SPM Salford Predictive Modeler software suite.


CE_36T CTW: Power and Sample-Size Analysis in Stata
INSTRUCTOR(S): Yulia Marchenko

Power and sample-size analysis is a key component in designing a statistical study. It investigates the optimal allocation of study resources to increase the likelihood of the successful achievement of a study objective. How many subjects do we need in a study to achieve its research objectives? A study with too few subjects may have a low chance of detecting an important effect, and a study with too many subjects may offer little gain and thus waste time and resources. What are the chances of achieving the objectives of a study given available resources? Or, what is the smallest effect that can be detected in a study given available resources? This workshop will help answer these questions by demonstrating a number of examples of power and sample-size analysis for several statistical methods, including t-test, McNemar’s test, and ANOVA. It also will demonstrate how to compute power by simulation and take advantage of Stata’s power command’s automatic table and graph creation from the results of your simulation. No prior knowledge of Stata is required, but basic familiarity with power and sample-size analysis will prove useful.


CE_37T CTW: Interactive Model Building in JMP Pro
INSTRUCTOR(S): Mia Stephens and Michael Clay

Building and comparing predictive models can be time consuming. However, JMP Pro Statistical Software from SAS provides a variety of tools (interactive graphs, data preparation, and analytics platforms) to help speed up the process. In this workshop, we use case studies to illustrate the model-building process in JMP. First, we look at data preparation and exploratory data analysis before building models. Then we consider a variety of modeling techniques: multiple linear and logistic regression, penalized regression, and classification and regression trees. We take advantage of JMP tools such as the Prediction Profiler and Interactive Solution Path in choosing a final model among candidate models. Finally we use the Model Comparison platform in JMP to compare a variety of competing models. Background theoretical and technical details behind the modeling techniques are provided in addition to examples of their use in JMP.


CE_38T CTW: Improve Your Regression with Modern Regression Analysis Techniques: Linear, Logistic, Nonlinear, Regularized, GPS, LARS, LASSO, Elastic Net, MARS, TreeNet Gradient Boosting, Random Forests
INSTRUCTOR(S): Kaitlin Onthank and Ling Chen

Linear regression plays a big part in the everyday life of a data analyst, but the results aren’t always satisfactory. What if you could drastically improve prediction accuracy in your regression with a new model that handles missing values, interactions, AND nonlinearities in your data? Instead of proceeding with a mediocre analysis, join us for this presentation, which will show you how modern regression analysis techniques can take your regression model to the next level and expertly handle your modeling woes. Using real-world data sets, we will demonstrate advances in nonlinear, regularized linear, and logistic regression. This workshop will introduce the main concepts behind Leo Breiman’s Random Forests and Jerome Friedman’s GPS (Generalized Path Seeker), MARS (Multivariate Adaptive Regression Splines), and Gradient Boosting. With these state-of-the-art techniques, you’ll boost model performance without stumbling over confusing coefficients or problematic p-values!

All attendees will receive six months access to fully functional versions of the the SPM Salford Predictive Modeler software suite.


CE_39T CTW: Multilevel and Mixed Models in Stata
INSTRUCTOR(S): Bill Rising

This workshop covers the use of Stata to fit multilevel (mixed) models, models that contain multiple levels of nested random effects. These effects may take the form of random intercepts or random coefficients on regressors. After a brief review of the methodology, we will demonstrate the fitting of such models in Stata. The focus will be primarily on linear (Gaussian) models, but binary and count responses also will be considered. In addition to models with nested effects, models with crossed effects will be discussed. No prior knowledge of Stata is required, but familiarity with the methodology and experience in fitting these models by other means will prove useful.


CE_40P PSD: (SPANS TWO DAYS) Nontechnical Skills to Become a More Effective Collaborator
INSTRUCTOR(S): Eric Vance, Heather Smith, and Doug Zahn

(spans two days)
Part I: Saturday, August 8, 8:00 a.m. - 12:00 p.m.
Part II: Sunday, August 9, 8:00 a.m. - 12:00 p.m.

This practical workshop will help you become a more effective collaborator to solve real-world problems and implement solutions. The first half will help you learn and practice how to structure and conduct effective, efficient meetings with clients and colleagues. The second half will guide you through key communication skills essential for success as a statistician. Throughout this workshop, you will practice these nontechnical skills in groups. You also will learn how to analyze video data taken during meetings with clients to systematically improve your collaboration skills. The methods you learn and practice in this workshop will assist you in your efforts to improve your effectiveness as you serve in all your professional roles.


CE_41P PSD: Effective Presentations for Statisticians: Success = (PD)2
INSTRUCTOR(S): Jennifer van Mullekom and Stephanie P. DeHart

Public speaking is the number-one fear in America, yet being able to do so is absolutely critical for success in business settings. Statisticians must be able to effectively convey their ideas to clients, collaborators, and decision makers. Presenting in the modern world is even more daunting when speakers have the opportunity to employ slideware, videos, and live demos. Unfortunately, university coursework and professional development programs are often not targeted toward sharpening these skills. This short course, developed and taught by statisticians, will provide an opportunity to learn how to employ different methods and tools in the phases of the Success = (PD)2 framework. The material covered is geared toward scientific presentations and based on the works of Garr Reynolds and Michael Alley, among others. The course will emphasize the importance of stepping away from the computer to Prepare an effective message aimed at your core point guided with a series of questions and tips. The Design phase emphasizes the importance of structure, streamlining, and good graphic design accompanied by a series of checklists. Of course, "Practice makes perfect" so we cannot skip this step. Finally, engaging the audience and effectively using the room and equipment is covered in the Deliver phase and is complemented with a handy list of dos and don’ts. No matter where you are in your journey for presentation success, improvement is always possible. We look forward to seeing you in this valuable class where you can hone your skills! Be prepared for an active class full of discussion and group exercises.


CE_42P PSD: (SPANS TWO DAYS) Preparing Statisticians for Leadership: How to See the Big Picture and Have More Influence
INSTRUCTOR(S): Bonnie LaFleur and Jim Hess

(spans two days)
Part I: Saturday, August 8, 1:00 p.m. - 6:30 p.m.
Part II: Sunday, August 9, 8:00 a.m. - 12:00 p.m.

What is leadership? Much has been written and discussed within the statistics profession in the last few years on the topic and its importance in advancing our profession. This course provides an understanding of leadership and how statisticians can improve and demonstrate leadership to affect their organizations. It features leaders from all sectors of statistics speaking about their personal journeys and provides guidance on personal leadership development with a focus on the larger organizational/business view and influence. Course participants work with their colleagues to discuss and resolve leadership situations statisticians face. Participants will come away with a plan for developing their own leadership and connect with a network of other statisticians who can help them move forward on their leadership journey.



Monday Roundtables and Speaker Luncheons

ML20 Designing Assessments That Support Teaching *and* Learning in Statistics
SPONSOR: Section on Statistical Education

INSTRUCTOR(S): Rochelle Tractenberg, Georgetown University

Psychologist and psychometrician Samuel Messick (1994) outlines three questions that support valid assessments: 1. What are the knowledge, skills, and abilities (KSAs) the curriculum should lead to? 2. What actions/behaviors by the students will reveal these KSAs? 3. What tasks will elicit these specific actions or behaviors (that reveal KSAs)? Learning can be positively affected by the assessment techniques used, and aligning instruction with assessments makes success on the assessments a clearer representation of learning. However, instructors often have no incentive to try new ideas with students; our experiment-with-new-methods might fail to produce the desired learning and we will think we wasted students' time and ours. Answering these questions should lead to a matrix against which assessments can be empirically compared. Modifying or developing assessments might require changes in time management, evaluation criteria, and levels of engagement by students---and not necessarily in teaching. We will discuss what participants may be contemplating or executing at their institutions and consider how to ensure assessments support both teaching and learning.



Tuesday Roundtables and Speaker Luncheons

TL05 What Makes One an Excellent Statistical Consultant?
SPONSOR: Section on Statistical Consulting

INSTRUCTOR(S): Vaneeta Kaur Grover, The Chemours Company, F.C., L.L.C

As statistical consultants, we all aspire to be excellent at what we do and to have our professional opinions and contributions valued by our clients, colleagues, and collaborators. But what makes one an excellent consultant? There is a fine line between being a good consultant and an excellent consultant. In this session, we will discuss some of the characteristics that define the difference and how you can incorporate these characteristics into your practice to achieve excellence. Two success factors beyond technical capabilities are leadership and communication skills. Bring your experiences and examples of working with/as excellent statistical consultants for discussion.



Wednesday Roundtables and Speaker Luncheons