Add-Ons
JSM sessions which require ticket purchase have limited availability and therefore are subject to sell-out or cancellation. Below are the functions which still have availability. Although this list is updated in real-time, please bear in mind that tickets are sold online around the clock; if you plan to purchase a function ticket onsite and see the function on this list before you travel to JSM, we cannot guarantee it will still be available for purchase when you arrive at JSM. To find out how many tickets remain for a particular function, please contact the ASA at (703) 684-1221.
Available Add-Ons
- Professional Development
- Monday Roundtables and Speaker Luncheons
- Tuesday Roundtables and Speaker Luncheons
- Wednesday Roundtables and Speaker Luncheons
Professional Development
CE_22C Analysis of Clinical Trials: Theory and Applications
INSTRUCTOR(S): Devan Mehrotra, Alex Dmitrienko, and Jeff Maca
The course covers six important topics that commonly face statisticians and research scientists conducting clinical research: analysis of stratified trials, analysis of longitudinal data with dropouts, analysis of pharmacogenetics data, crossover trials, multiple comparisons, and interim decision making and adaptive designs.
The course offers a well-balanced mix of theory and applications. It presents practical advice from experts and discusses regulatory considerations. The discussed statistical methods will be implemented using SAS and R software. Clinical trial examples will be used to illustrate the statistical methods.
The course is designed for statisticians working in the pharmaceutical or biotechnology industries as well as contract research organizations. It is equally beneficial to statisticians working in institutions that deliver health care and government branches that conduct health-care related research. The attendees are required to have basic knowledge of clinical trials. Familiarity with drug development is highly desirable, but not necessary.
This course was taught at JSM 2005-2016 and received the Excellence in Continuing Education Award in 2005.
CE_23C Construction of Weights in Surveys
INSTRUCTOR(S): David Haziza
Most surveys are designed to provide statistics for a possibly (very) large number of characteristics of interest. Typically, the data collected are stored in a rectangular data file, each row corresponding to a sample unit and each column corresponding to a characteristic of interest. Made available on the data file is a weighting system. The idea is to construct a single weighting system applicable to all the characteristics of interest. The typical weighting process involves three major stages. At the first stage, each unit is assigned a base weight, defined as the inverse of its inclusion probability. The base weights are then modified to account for unit nonresponse. At the last stage, the weights adjusted for nonresponse are further modified to ensure consistency between survey estimates and known population totals. When needed, the weights undergo a last modification through weight trimming or weight smoothing methods in order to improve the efficiency of survey estimates. The goal of the course is to provide a detailed description of each stage. Participants should have a background in survey sampling and regression analysis. The course is intended to survey statisticians working in survey organizations, graduate students and users of survey data.
CE_24C Analysis of Categorical Data
INSTRUCTOR(S): Christopher Bilder and Thomas Loughin
We live in a categorical world! From a positive or negative disease diagnosis to choosing all items that apply in a survey, outcomes are frequently organized into categories so that people can make sense of them. In this course, participants will learn how to analyze the most common types of categorical data. The course is divided into four main sections. The first three sections are organized by response type: 1) binary/binomial, 2) multicategory, and 3) count. Within each section, we examine how to estimate and interpret appropriate models while giving practical advice on their use. The fourth section applies model selection and evaluation methods to those models discussed in the first three. The ideal background for participants is experience with multiple linear regression and with the application of likelihood-based methods (e.g., likelihood ratio). All computations will be performed using R, so familiarity with its basics is recommended. Participants who will benefit the most are those who do not have experience with categorical methods and those who already have experience but have not used them in R. In addition to handouts and R programs to perform every computation, a recording of the course will be available to participants after JSM.
CE_25C High-Dimensional Covariance Estimation and Portfolio Selection
INSTRUCTOR(S): Mohsen Pourahmadi
The course provides a broad introduction to covariance estimation for high-dimensional data and its role in portfolio selection in finance. It is known that in high dimensions the sample covariance matrix is a notoriously bad estimator of its population counterpart. We discuss the details for two useful and viable alternatives to the sample covariance matrix: First, the class of shrinkage estimators, which makes minimal structural assumptions on the population covariance matrix, shrinks the sample eigenvalues toward a central value. It includes the well-conditioned Ledoit-Wolf estimator and is pretty much in the spirit of the ridge regression. The second class is centered around factor models and principal component analysis (PCA) and makes hard to verify structural assumptions. Various case studies and data sets will be discussed in detail using some existing packages in R.
CE_26C Research and Analysis Workflows: Low-cost, Every-day Project Management Techniques, Tools, and Tips that Produce High-quality, Streamlined, Stress-free Research and Data Science
INSTRUCTOR(S): Matt Jans and Abhijit Dasgupta
This full-day course assumes you want fewer data collection and analysis mistakes in your work, more efficient and productive meetings, and time and sanity back in your life. This course presents the most useful and generally-applicable tools and techniques culled from the instructors' 30+ combined years of experience in statistics, data science, collaborative biomedical research, consulting, public health, social science, and survey research. It emphasizes simple tools and basic habits that will streamline your research process, whether you are involved in data collection, statistical analysis, or other aspects of the research lifecycle. It focuses on tools that cost little beyond the time and effort it takes to learn and practice them. The first half of the course will include general project and time management techniques. The second half of the course will focus on best practices for your data science pipeline to minimize errors, maximize time to think, and maintain reproducibility. All techniques taught have been tested and adapted by the instructors in their project management work. Students will benefit from the instructors' extensive research and statistical experience. Students will leave with a collection of concrete tools and tips that they can implement immediately.
CE_27C Essentials of High Performance and Parallel Statistical Computing with R
INSTRUCTOR(S): Wei-Chen Chen and George Ostrouchov
This is an introductory course in high performance and parallel statistical computing, which is essential for statistical modeling when dealing with big data. We introduce fundamentals of parallel statistical computing including the use of the pbdR package ecosystem on larger platforms. We present a broad overview of parallel programming paradigms and relate parallel approaches within R for statistical computation. Practical examples beginning with strategies for speeding up serial R code and continuing with parallel approaches of increasing complexity are discussed. Computing platforms ranging from multicore laptops to medium and even large distributed systems are covered. We bring a coherent approach that is based on established advanced parallel computing concepts from the high performance computing (HPC) community, all within the comfort of R. Basic knowledge of R and statistical computing are assumed.
CE_29T Analyzing Temporal and Spatiotemporal Data in IBM Products
INSTRUCTOR(S): David Nichols, Svetlana Levitan, and Hui Yang
This workshop will introduce participants to the theory of each algorithm and demonstrate their use in IBM products with real-life data. These methods will be demonstrated in IBM SPSS Modeler and IBM SPSS Statistics products, as well as the new cloud-based IBM Data Science Experience. Basic knowledge of time series analysis and data mining is assumed.
CE_30T Analyzing Multilevel Models with the GLIMMIX Procedure
INSTRUCTOR(S): Min Zhu
This tutorial will show you how to construct a multilevel model to account for variability at each level through both explanatory and random variables. Then you will learn how to use the generalized linear mixed model procedure GLIMMIX in SAS/STAT® to estimate multilevel models for both continuous and discrete responses. You will also learn about enhanced weighting options for PROC GLIMMIX that handle weights at different levels. Finally, you will see how to apply these features to analyzing complex survey data collected by multistage sampling with unequal sampling probabilities.
CE_31T Introduction to Data Mining with CART Classification and Regression Trees
INSTRUCTOR(S): Dan Steinberg, Mikhail Golovnya, and Charles Harrison
This tutorial is intended for the applied statistician wanting to understand and apply the CART classification and regression trees methodology. The emphasis will be on practical data analysis and data mining involving classification and regression.
CE_32T Bayesian Analysis Using Stata
INSTRUCTOR(S): Yulia Marchenko
This workshop covers the use of Stata to perform Bayesian analysis. I will demonstrate the use of Bayesian analysis in various applications and introduce Stata’s suite of commands for conducting Bayesian analysis. No prior knowledge of Stata is required, but basic familiarity with Bayesian analysis will prove useful.
CE_33T Causal Treatment Effect Analysis Using SAS/STAT Software
INSTRUCTOR(S): Yiu-Fai Yung
This workshop introduces two SAS/STAT® procedures, CAUSALRT and PSMATCH, for the analysis of causal treatment effects from observational data. It also gives a brief, high-level account of causal inference issues and the principles that underlie the two procedures. Basic familiarity with generalized linear models is assumed.
CE_35T Data Visualization for Life Sciences with JMP
INSTRUCTOR(S): Kelci Miclaus and Richard Zink
The goal of this workshop is to describe data visualization techniques for understanding and communicating results from applications in clinical trials and genomics research using the JMP family of products. Topics include distributional summaries, signal detection safety outcomes, findings trends and abnormalities, data integrity, co-occurrence, clustering and correlations, genomic associations, subgroup analysis, and meta-analysis.
CE_36T Advanced Methods for Survival Analysis Using SAS/STAT Software
INSTRUCTOR(S): Changbin Guo
This tutorial begins with a review of basic concepts and then presents two sets of model assessment methods—concordance statistics and time-dependent ROC curves—that are available in the PHREG procedure in SAS/STAT 14.2. Next, the tutorial introduces the ICLIFETEST and ICPHREG procedures for the analysis of interval-censored data. The tutorial then turns to the analysis of competing risks data and explains how to use the LIFETEST procedure to conduct nonparametric survival analysis and the PHREG procedure to investigate the relationship of covariates to cause-specific failures. A basic understanding of applied statistics is assumed.
CE_37T Evolution of Classification: From Logistic Regression and Decision Trees to Bagging/Boosting and Netlift Modeling
INSTRUCTOR(S): Mikhail Golovnya, Charles Harrison, and Dan Steinberg
This presentation will cover recent improvements to conventional decision tree and logistic regression technology via two case study examples: one in direct marketing and the second drawn from biomedical data analysis. Within the context of real-world examples, we will illustrate the evolution of classification by contrasting and comparing: Regularized Logistic Regression, CART, Random Forests, TreeNet Stochastic Gradient Boosting, and RuleLearner.
CE_38T Weighted GEE Analysis Using SAS/STAT Software
INSTRUCTOR(S): Michael Lamm
This workshop introduces you to the GEE procedure (new in the SAS/STAT 13.2 release), which supports both the standard and weighted GEE methods for analyzing longitudinal data.
You will learn about the different mechanisms used to describe why a response is missing and how the missing data mechanism affects inference using the standard and weighted GEE approaches. A basic familiarity with generalized linear models is assumed.
CE_39T Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
INSTRUCTOR(S): Dan Steinberg, Mikhail Golovnya, and Charles Harrison
In this presentation, specifically designed for statisticians, we will show how you can quickly and easily create data mining models. This tutorial follows a step-step approach to introduce advanced automation technology, including CART, MARS, TreeNet Gradient Boosting, Random Forests, and the latest multi-tree boosting and bagging methodologies by the original creators of CART.
Monday Roundtables and Speaker Luncheons
ML19 Pursuing a Career in Statistics and Statistical Programming - a Students' Perspective
SPONSOR: Section for Statistical Programmers and Analysts
SPEAKER(S): Jessica Colson
We may or may not realize statistics is a part of our everyday life. From sports data to wellness and environment, several fields rely on gathering and interpreting the data by simple or sophisticated statistical analysis to convert the data into meaningful information for further decision making. Because of this, the demand for statisticians, statistical programmers, and data analysts has been growing and would continue to grow future. The key question remains on how does one decide to consider and pursue a career in these fields? For those of us who are already in this field or someone like me who has started pursuing this career track, what are our responsibilities to encourage others to consider and pursue these careers as it will take lot more that what is being done via digital media etc.? This round table will be a great opportunity to hear others at the table with their specific thoughts and experiences that that want to share as individuals or based on their roles including (but not limited to) ASA or other sponsored activities.
ML25 Data Science and Environmental Statistics
SPONSOR: Section on Statistics and the Environment
SPEAKER(S): Stephan Sain
Data science is receiving a lot of attention recently, and it goes beyond search engines and online retailers. Data science is influencing fundamental research in many areas of academia and industry, including areas that might be of interest to ENVR members. Climate science is being driven by big data and distributed computing. Digital agriculture is using data science, along with the ever increasing amount of data from some highly diverse sources, to redefine how farmers make decisions. In this roundtable, we will discuss such topics as 1) the role of statistics in data science and 2) how big data and data science are influencing research in academia and industry, especially those that involve the environment.
Tuesday Roundtables and Speaker Luncheons
TL06 Advances and Challenges in Disease Surveillance
SPONSOR: Section on Statistics in Defense and National Security
SPEAKER(S): Ronald Fricker, Virginia Tech
Throughout history, the human race has periodically been ravaged by disease. One of the most extreme examples is the Bubonic or Black Death plague pandemic of the 12th century that originated in Central Asia and by some estimates killed 60% of Europe's population. Seeking to avoid pandemics, where the threat of bioterrorism today makes timely and effective disease surveillance as much a national security priority as a public health priority, public health agencies conduct disease surveillance by actively gathering and analyzing human health and disease data. The two main aspects of disease surveillance are (1) early event detection, and (2) situational awareness, and statistical methods have played and continue to play an important role in both. In this roundtable, we will discuss recent advances in statistical methods for disease surveillance as well as the ongoing research and implementation challenges.
TL25 Best Practices in Predicting Customer Attrition
SPONSOR: Section on Statistics in Marketing
SPEAKER(S): Adraine Upshaw, BBVA Compass
Customer acquisition is a major focus of marketing departments in any organization. Customer attrition or churn should receive the same focus. In 2014, the Harvard Business review reported that acquiring a new customer is more than 5X the cost of retaining an existing customer. In the era of globalization, profit margins are shrinking in all industries. Data mining techniques can used to achieve the goal of retaining profitable customers. In the roundtable we will discuss various approaches in predicting customer churn.
Follow: