Program > Add-Ons

JSM sessions which require ticket purchase have limited availability and therefore are subject to sell-out or cancellation. Below are the functions which still have availability. Although this list is updated in real-time, please bear in mind that tickets are sold online around the clock; if you plan to purchase a function ticket onsite and see the function on this list before you travel to JSM, we cannot guarantee it will still be available for purchase when you arrive at JSM. To find out how many tickets remain for a particular function, please contact the ASA at (703) 684-1221


Available Add-Ons


Continuing Education and Computer Technology Workshops

CE_02C Introduction to Data Visualisation and Analysis with R
Cosponsors: Statistical Computing Section, Section for Statistical Programmers and Analysts, Section on Statistical Graphics

Instructors: John Emerson and Hadley Wickham

This full-day course will take you beyond a standard introduction to R, emphasizing the use of statistical graphics for data exploration and analysis. It will include topics such as visualization with base graphics and ggplot2, an introduction to grid graphics programming, data manipulation with plyr, and introductory parallel programming with foreach. The course will provide several real-data examples for graphical exploration and statistical analysis. You don't need to know R, although a basic familiarity with R or experience programming or scripting in another language or environment would be useful.


CE_03C Analysis of Clinical Trials: Theory and Applications
Cosponsor: Biopharmaceutical Section

Instructors: Alex Dmitrienko, Devan Mehrotra, and Keaven Anderson

The course covers five important topics that commonly face statisticians and research scientists conducting clinical research: analysis of stratified trials, analysis of longitudinal data with dropouts and potential outliers, analysis of time-to-event data (with emphasis on small trials), multiple comparisons and multiple endpoints, and interim analysis and interim data monitoring. The course offers a well-balanced mix of theory and applications. It presents practical advice from experts and discusses regulatory considerations. The discussed statistical methods will be implemented primarily using SAS, but also, for group sequential design, R software. Clinical trial examples will be used to illustrate the statistical methods. The course is designed for statisticians working in the pharmaceutical or biotechnology industries as well as contract research organizations. It is equally beneficial to statisticians working in institutions that deliver health care and government branches that conduct health-care related research. The attendees are required to have basic knowledge of clinical trials. Familiarity with drug development is highly desirable, but not necessary. This course was taught at JSM 2005-2011 and received the Excellence in Continuing Education Award in 2005.


CE_07C Statistics for Spatio-Temporal Data
Instructors: Christopher Wikle and Noel Cressie

The course will follow the recently published book by Cressie and Wikle, Statistics for Spatio-Temporal Data (2011) - John Wiley and Sons, Hoboken, NJ. It is a state-of-the-art presentation of spatio-temporal processes, bridging classic ideas with modern hierarchical statistical modeling concepts. From understanding environmental processes and climate trends to developing new technologies for mapping public-health data and the spread of invasive-species, there is a high demand for statistical analyses of data that take spatial, temporal, and spatio-temporal information into account. We will present a systematic approach to key quantitative techniques for the statistical analysis of such data that features hierarchical (empirical and Bayesian) statistical modeling, with an emphasis on dynamical spatio-temporal models. Prerequisite: Anyone with a Masters or PhD degree in Statistics. Background should include Masters level probability and statistical inference and good understanding of matrix algebra.


CE_08C Targeted Learning: Causal Inference for Observational and Experimental Data
Instructors: Mark van der Laan, Maya Petersen, and Sherri Rose

This course concerns statistical methods for causal inference using observational and experimental point treatment and longitudinal data. Attendees will learn both the application of and theory behind methodological advances for causal inference. A review and scientific critique of current estimation will be provided, including an introduction to targeted learning. Structural causal models (causal graphs) and working marginal structural models will be introduced as tools for translating a policy or research question and background knowledge into a target statistical quantity and model. The course will emphasize understanding and responding to the challenges posed by randomized controlled trials and observational cohorts, including informative drop-out/censoring, missing data, time-dependendent confounding, and high dimensional covariates. Examples from the areas of HIV research and the epidemiology of aging, together with other fields, will be used as illustrations and will provide practical experience with analytic design and accurate interpretation of results. Anticipated audience will include statisticians with a strong background in maximum likelihood estimation and possible previous exposure to causal assumptions, e.g., Chapters 1, 2, 4.1-4.4, and 7 in Causality by Pearl. Course content covers material from Chapters 1-7, 9, 10, and 16-18 of Targeted Learning by van der Laan & Rose, as well as additional advances.


CE_09C Introduction to Bayesian Methods and Software for Data Analysis
Cosponsor: Section on Bayesian Statistical Science

Instructors: Bradley P. Carlin and Laura A. Hatfield

Hierarchical Bayes methods enable combining information from similar and independent experiments, yielding improved inference for individual and shared model characteristics. Bayesian structuring also provides an effective "procedure-generator." With recent advances in computing and the consequent ability to evaluate complex models, Bayesian data analysis methods have gained popularity. This course introduces hierarchical Bayes methods, demonstrates their usefulness in challenging applied settings, and illustrates their implementation using Markov chain Monte Carlo (MCMC). We also provide an introduction to BUGS and other software packages (such as R2WinBugs) for calling BUGS from R. Use of the methods will be demonstrated in advanced settings (e.g., nonlinear longitudinal modeling or spatio-temporal estimation and mapping), where the MCMC Bayesian approach often provides the only feasible alternative that incorporates all relevant model features. Participants should have an MS-level understanding of mathemaical statistics, e.g., at the level of Casella and Berger (2001). Familiarity with common statistical models (e.g., the linear regression model) and computing are assumed, but we do not assume previous exposure to Bayes. The course is aimed at students and practicing statisticians who are intrigued by Bayes and Gibbs, but who may still mistrust the approach as theoretically mysterious and practically cumbersome.


CE_11C Analysis of Data from Complex Surveys Using R
Cosponsor: Statistical Computing Section

Instructor: Thomas Lumley

Data from complex sampling designs are increasingly being used in regression modelling, either in secondary analysis of large national and international surveys or when additional expensive variables must be measured on a subsample of an existing cohort. Many of the differences between independently-sampled data and complex samples can be hidden by appropriate software, but some differences must be understood by the analyst. This course will cover exploratory data analysis and graphics, regression modelling, and calibration of weights, for data from complex surveys. Examples will include both multistage national surveys and two-phase samples from existing cohorts. Participants should have basic knowledge of survey design and analysis (eg chapters 1-6 of Heeringa, West, Berglund, Applied Survey Data Analysis) , and practical experience either in using R for data analysis or in complex survey analysis using other software. Data and code examples will be available; participants may wish to bring a laptop.


CE_18C Meta-Analysis: Combining the Results of Multiple Studies
(HALF-DAY COURSE)

Instructors: Christopher H. Schmid and Ingram Olkin

Meta-analysis enables researchers to synthesize the results of multiple studies designed to determine the effect of a treatment, device or test. The information explosion as well as the movement toward the requirement of evidence to support policy decisions has promoted the use of meta-analysis in all scientific disciplines. Statisticians play a major role in meta-analysis because analyzing data with few studies and many variables is difficult. In this workshop, we introduce the major principles and techniques of statistical analysis of meta-analytic data. Examples of published meta-analyses in medicine and the social sciences will be used to illustrate the various methods.


CE_19C Design and Analysis of Biomarker Studies for Risk Prediction
(HALF-DAY COURSE)

Cosponsor: Biometrics Section

Instructors: Tianxi Cai and Yingye Zheng

An accurate and individualized outcome prediction promises to dramatically change clinical decision making in many branches of medicine, for example in early diagnosis of cancer and in selecting patient-specific treatments. But translating the promise into reality is not easy. Clinical evaluations, while remaining an essential basis for risk assessment, may not be sufficient for complex diseases. Improved prediction may be achieved by combining information from multiple markers based on emerging new technology such as gene expression profiling, protein mass spectrometry and proton emission tomography. Most marker tests are imperfect, and incorporate test results can have enormous consequences in both financial and human terms. Prior to incorporating a biomarker into standard clinical care, rigorous evaluation is required. Designing an rigorous study that efficiently uses available biologic specimens is critical. Compared to classical statistical methods for evaluating medical diagnostic test, there is relatively little literature devoted to statistical methods for marker development carried out in a prospective cohort study with censored failure time outcome. This short course will introduce recent statistical development for constructing and evaluating risk prediction model (markers) with censored data. While providing some mathematical details, we will emphasize the concepts, methods and their real world applications with the aim of both (i) offering an overview of the rapid developing area of risk prediction and biomarker evaluation; and (ii) in depths discussions on efficient design of biomarker and risk prediction studies. Prerequisite: Survival analysis.


CE_20C Bayesian Clinical Trials
Cosponsor: Section on Bayesian Statistical Science

Instructors: Scott Berry and Kert Viele

The Bayesian approach was long thought to be a superior philosophical approach to statistical inference. While initial adoption of Bayesian methods was hampered by computational limitations, modern computing has resulted in the Bayesian approach having critical practical importance. The Bayesian approach has been utilized and is continuing to grow in its practical usage in clinical trials. In this course we present a variety of uses of the Bayesian approach in medical research and clinical trials. The focus is on the applied and practical uses of the Bayesian approach to critical aspects of drug and device development and biostatistics. Many examples will be provided from different stages of development (including comparative effectiveness) and therapeutic areas.


CE_21C An Introduction to Statistical Learning
Cosponsor: Section on Statistical Learning and Data Mining

Instructors: Gareth James and Yufeng Liu

This one-day seminar will be a practical introduction to and an overview of statistical learning methods. The course aims to go far beyond classical statistical methods such as linear regression. As computing power has increased over the last 20 years many new, highly computational, regression, or "Statistical Learning", methods have been developed. In particular the last decade has seen a significant expansion of the number of possible approaches. This course aims to provide an applied overview to such modern methods as Cross-validation, Lasso, Generalized Additive Models, Decision Trees and Support Vector Machines as well as more classical approaches such as Linear Discriminant Analysis, Quadratic Discriminant Analysis, Nearest Neighbors and Ridge Regression. Participants should be familiar with linear regression.


CE_22C Generalized Additive Models and their Extensions: The Penalized Regression Spline Approach
Cosponsor: Section on Physical and Engineering Sciences

Instructor: Simon Wood

This course provides an overview of the theory of generalized additive models represented by reduced rank penalized splines, and their practical use with the mgcv package in R. Here generalized additive models include generalized additive mixed models, varying coefficient/geographic regression models, structured additive regression models, generalized linear additive smooth structure models, signal regression models etc, since all of these fit into the same inferential and computational framework (quadratically penalized GLMs). The course will give a compact overview of the essential theory of penalized regression splines and GAMs, focusing on the key theoretical concepts that underpin the more detailed literature: bases, penalties, the Bayesian model of smoothing, and smoothing parameter selection. It will then cover the various types of smooth (one dimensional, isotropic and tensor product interactions) that form the basic toolkit for model construction. Model checking, building and selection will be discussed, including practical exercises with the mgcv package in R. The course will finish with a look at some more advanced GAM topics: spatial and temporal auto-correlation, functional data analysis, and inference via posterior simulation. Participants should preferably bring a laptop, with the latest version of R installed. Reading: Wood SN, (2006) Generalized Additive Models: An introduction with R.


CE_23C Design and Analysis of Non-Inferiority Trials
Cosponsor: Biopharmaceutical Section

Instructor: Brian Wiens

This one-day short course will review the basics of non-inferiority clinical trials and provide discussion on advanced aspects. Participants are expected to have some (at least minimal) experience with clinical trials aimed at regulatory approval for drugs, biologics or medical devices. The first half of the course will discuss background, assumptions, analyses and intended uses of non-inferiority trials, with focus on regulatory review of new medical products. Methods for choosing a sample size and for assuring validity and assay sensitivity will be discussed. The second half will include analysis methods for continuous data, binary data (including matched pair designs) and time-to-event data, and issues with non-inferiority trials, including multiple comparisons, missing data and adaptive designs. Throughout the course, examples will be presented to illustrate the concepts.


CE_24C Principles and Applications of Multivariate Analysis
Instructor: Peter Bajorski

This course is a review of fundamentals of multivariate analysis. In addition to general understanding of variability in multiple variables, we discuss the classic tools of principal component analysis (PCA), canonical correlation, and classification. We will also discuss more modern concepts of independent component analysis and nonnegative PCA. The course is for those who either already took a course in multivariate analysis, but never developed an intuitive feel for it, or for those who never took the course and would like to learn it now. For the first group, this course will provide the geometric and intuitive way of thinking about multiple dimensions and finally make it all click into a coherent approach. For the second group, this course is a great starting point before they embark on a more formal and deeper study of any of the topics in multivariate analysis, including both the classic and modern methods. The course provides many examples of analyses of high dimensional data, including spectral and imaging type of data. The R code will be provided, but the knowledge of R is not required. The participants are expected to know the basics of linear and matrix algebra and univariate statistical inference.


CE_25C Analysis of Overdispersed Data using SAS®
Instructors: Jorge Morel and Nagaraj Neerchal

Overdispersion (extravariation) arises in binomial/multinomial/count data when variances are larger than those allowed by the binomial/multinomial/Poisson model. This phenomenon is caused by clumping, presence of excess of zeros in the data, lack of independence, or clustering. Commonly used overdispersion models include: the Beta-binomial, the Random-clumped Binomial, the Zero-inflated Binomial, the Negative-binomial, the Zero-inflated Poisson and Negative-binomial, the Poisson and Negative-binomial Hurdle models, the Dirichlet-multinomial, and the Random-clumped Multinomial. When covariates are available, the mean and the overdispersion parameters can be modeled using appropriate link functions as in Generalized Linear Models (GLM). Such models will be called Generalized Linear Overdispersion Models (GLOM). GLOM do not always belong to the exponential family, and therefore not usually covered under GLM expositions. The aim of the course is to introduce GLOM using several real-life examples, and illustrate the main methods of estimation such as Quasi-likelihood, Maximum Likelihood and Generalized Estimating Equations. Examples will be analyzed using the SAS® procedures COUNTREG, LOGISTIC, GENMOD, GLIMMIX, NLMIXED and SURVEYLOGISTIC. Basic knowledge of the binomial, Poisson and multinomial distributions is required. Familiarity with logistic and Poisson regressions is recommended.


CE_26C Simulation and Sampling of Data
(HALF-DAY COURSE)

Cosponsor: Section for Statistical Programmers and Analysts

Instructor: Rick Wicklin

Sampling and simulation are fundamental techniques in statistical programming. To assess statistical methods, you often need to create data with known properties, both random and nonrandom. This workshop presents techniques for simulating data with particular properties. The student will learn to sample data from: common discrete and continuous distributions, heavy-tailed, skewed, and mixture distributions, multivariate distributions, distributions with known properties such as a specific covariance structure or a known regression structure. The student will learn to use simulated data to estimate the probability of an event, estimate the sampling distribution of a statistic, estimate the coverage probabilities of approximate confidence intervals, and evaluate the robustness of a statistical test when assumptions of the test are not satisfied. This workshop is intended for practicing statisticians who need to simulate data. Pre-requisites include standard statistical concepts such as distributions, variance, correlation, and regression. Examples are presented using the SAS system. To follow the programming examples, it will be helpful if participants are familiar with basic SAS programming and procedures. Participants who are unfamiliar with SAS/IML software can refer to the book, Statistical Programming with SAS/IML Software (Wicklin 2010).


CE_27T SAS® Procedures for Analyzing Survey Data
Instructor: Pushpal K. Mukhopadhyay

The analysis of probability-based sample surveys requires specialized techniques that account for survey design elements, such as strata, clusters, and unequal weights. This workshop provides an overview of the basic functionality of the SAS/STAT® procedures that are developed specifically for selecting and analyzing probability samples for survey data. You will learn how to select probability samples with the SURVEYSELECT procedure, how to produce descriptive statistics with the SURVEYMEANS and SURVEYFREQ procedures, and how to build statistical models with the SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures. Characteristics of different variance estimation techniques, including both Taylor series and replication methods, and domain (subpopulation) estimation techniques will also be discussed. The workshop is intended for a broad audience of statisticians who are interested in analyzing sample survey data. Familiarity with basic statistics, including regression analysis, is strongly recommended.


CE_28T Predictive Modeling in the Life Sciences
Instructors: Russell Wolfinger and Richard Zink

Unlike commercial data mining applications in finance, retail, and telecommunications, data sets from life science domains typically have orders of magnitude more predictors than observations. In these "wide data" instances, it is very easy to overfit the data with predictive models. Furthermore, it is rarely obvious what form of predictive model will be best for a new data set. Consequently, honest cross validation model comparison is essential to achieve some assurance of generalizability and optimality. In this hands-on workshop, we will introduce the predictive modeling capabilities of JMP Clinical and JMP Genomics. These products combine the analytical power of SAS with the elegant user interface and dynamic graphics of JMP. We'll describe techniques for reducing the predictor space, demonstrate tools for comparing a large pool of potential models to find the best ones for a given data set, and discuss drill-down actions for determining the usefulness of a particular model. Data from a clinical trial of aneurysmal subarachnoid hemorrhage and a genomics study using next generation sequencing will provide illustration. Participants will be provided with a free trial version of JMP and a journal with results scripts. It is assumed that attendees have had a prior course in statistical modeling.


CE_29T Advances in Tree-Based Modeling Tools, Data Mining, Predictive Analytics, and Modeling Automation Technology: Introduction to SPM, Salford Predictive Modeler
Instructor: Dan Steinberg

This tutorial will introduce SPM, Salford Systems' Predictive Modeling Suite. SPM incorporates advanced automation technology and includes CART, MARS, TreeNet, Random Forests, and the latest multi-tree boosting and bagging methodologies by the original creators of CART (Breiman, Friedman, Olshen and Stone). SPM technologies span classification, regression, hotspot analysis, variable interaction detection, missing value analysis, and clustering/segmentation to cover all aspects of a data mining, predictive analytics and predictive modeling project. The tutorial will introduce you to SPM product functionality and show you how to benefit from SPM automation technology. All attendees will receive 6 months access to fully functional versions of the software.


CE_30T Advances in Data Mining, State-of-the-Art Algorithms from Jerome Friedman: GPS (Generalized Pathseeker), ISLE (Importance Sampled learning Ensembles), and Rulefit Rule Extraction Engine
Instructor: Mikhail Golovnya

Using real world data sets we will demonstrate Stanford Professor Jerome Friedman's advances in regularized linear and logistic regression and important extensions to gradient boosted tree technology. GPS: Allows for ultra-fast modeling with massive numbers of predictors, with powerful predictor selection and coefficient shrinkage, includes classic techniques such as ridge and lasso regression, and also the new sub-lasso model, and clear tradeoff diagrams between model complexity and predictive accuracy allow modelers to select an ideal balance. ISLE: for the compression of tree ensembles and complex many-tree ensembles can be simplified and pruned via ISLE compression yielding simpler and faster executing ensembles. RULEFIT: using TreeNet and/or RandomForests tree ensembles as rule search engines, RULEFIT extracts individual nodes to exhibit interesting and predictive rules, rules are optimally combined to yield models that are often more accurate than the original ensembles, and RULEFIT supports individual specific and group specific variable importance rankings and offers dependency plots for model interpretation This tutorial will show real world examples, discuss key algorithmic details, and cover implementation and best practices. Prerequisites: This course is intended to be accessible to anyone with experience with regression modeling.


CE_31T Creating Statistical Graphics with SAS®
Instructor: Warren Kuhfeld

Effective graphics are indispensable in modern statistical analysis. SAS 9.2 provides ODS Graphics, new functionality used by statistical procedures to create statistical graphics as automatically as they create tables. ODS Graphics is also used by new procedures that are designed for graphical exploration of data. This tutorial is intended for statistical users and covers the use of ODS Graphics from start to finish in statistical analysis. You will learn how to: Request graphs created by statistical procedures, Use the new SGPLOT, SGPANEL, SGSCATTER, and SGRENDER procedures to create customized graphs, Access and manage your graphs for inclusion in Web pages, papers, and presentations, Modify graph styles (colors, fonts, and general appearance), Make immediate changes to your graphs using a point-and-click editor, Make permanent changes to your graphs with template changes, and Use new SAS 9.3 features.


CE_32T Survey Data Analysis with Stata
Instructor: Jeffrey Pitblado

This workshop covers how to use Stata for survey data analysis assuming a fixed population. Knowledge of Stata is not required, but attendees are assumed to have some statistical knowledge, such as what is typically covered in an introductory statistics course. We will begin by reviewing the sampling methods used to collect survey data, and how they affect the estimation of totals,ratios, and regression coeffcients. We will then cover the three variance estimators implemented in Stata's survey estimation commands. Strata with a single sampling unit, certainty sampling units, subpopulation estimation, and poststratification will be also covered in some detail. Each topic will be illustrated with an example in a Stata session.


CE_33T Introduction to CART: Data Mining with Decision Trees
Instructor: Mikhail Golovnya

This Tutorial is intended for the applied statistician wanting to understand and apply the CART methodology for tree-structured non-parametric data analysis. The emphasis will be on practical data analysis involving classification. All concepts will be illustrated using real-world examples. The course will begin with an intuitive introduction to tree-structured analysis- what it is, why it works, why it is non-parametric and model-free. Working through examples, we will review how to read CART output and how to set up basic analyses. This session will include performance evaluation of CART trees and will cover ways to search for possible improvements of the results. Once a basic working knowledge of CART has been mastered, the tutorial will focus on critical details essential for advanced CART applications including: choice of splitting criteria, choosing the best split, using prior probabilities to shape results, refining results with differential misclassification costs, the meaning of cross validation, tree growing and tree pruning. The course will conclude with some discussion of the comparative performance of CART versus other computer-intensive methods such as artificial neural networks and statistician-generated parametric models. All attendees will receive 6 months access to fully functional versions of the software.


CE_34T East® Architect: A Powerful New Statistical Environment for Designing, Monitoring and Simulating Clinical Trials
Instructor: Cyrus Mehta

This workshop introduces East® Architect, the entirely re-designed update of our industry standard clinical trial design tool. With the new East Architect you can now design, monitor and simulate single look, group sequential and adaptive trials with one arm, two or multiple arms - with binary, continuous or time-to-event endpoints. We'll tour the new interface - Cytel's Architectÿ platform - enabling simultaneous previews of multiple design scenarios to optimize your choosing of design parameters. You'll experience the advantages of reading-in external source data sets for interim and final analyses, and when computing conditional power. Also reviewed: new simulation capabilities extending available designs through calls to external R functions, new capabilities to handle trials with delayed response and drop-outs provides a more realistic assessment of early stopping costs vs. benefits, and how to produce customized reports with a special canvas combining plots, output and text.


CE_35T Group Sequential Analysis Using SAS® Software
Instructor: Yang Yuan

A group sequential trial provides for interim analyses before completion of a clinical trial. Group sequential methods can help prevent unnecessary exposure of patients to an unsafe new drug, or alternatively, to a placebo treatment if a new drug shows tremendous promise. This workshop reviews basic concepts of group sequential analysis and introduces two SAS/STAT® procedures: SEQDESIGN and SEQTEST. The SEQDESIGN procedure creates group sequential designs by computing boundary values with a variety of methods, including the O'Brien-Fleming, Whitehead, and error spending methods; it also computes required sample sizes. The SEQTEST procedure compares the test statistic with the corresponding boundary values at each stage so that the trial can be stopped to reject or accept the hypothesis; it also computes parameter estimates, confidence limits, and p-values. Numerous examples illustrate the capabilities of the software. Attendees should be familiar with hypothesis testing and with other SAS/STAT procedures such as the REG and LOGISTIC procedures.


CE_36TMultilevel/Mixed Models Using Stata
Instructor: Bill Rising

This workshop covers the use of Stata to fit multilevel (mixed) models, models that contain multiple levels of nested random effects. These effects may take the form of either random intercepts or random coefficients on regressors. After a brief review of the methodology, the course will demonstrate by example the fitting of such models in Stata. The focus will be primarily on linear (Gaussian) models, but binary and count responses will also be considered. In addition to models with nested effects, models with crossed effects will also be discussed. No prior knowledge of Stata is required, but familiarity with the methodology and experience in fitting these models by other means will prove useful.


CE_37T Introduction to MARS: Predictive Modeling with Nonlinear Automated Regression Tools
Instructor: Mikhail Golovnya

This workshop will introduce the main concepts behind Jerome Friedman's MARS, a modern regression tool that can help analysts quickly develop superior predictive models. MARS is a nonlinear automated regression tool that can trace complex patterns in the data. It automates the model specification search, including variable selection, variable transformation, interaction detection, missing value handling, and model validation. Conventional regression models typically fit straight lines to data. Although this usually oversimplifies the data structure, the approximation is sometimes good enough for practical purposes. However, in the frequent situations in which a straight line is inappropriate, an expert modeler must search tediously for transformations to find the right curve. MARS approaches model construction more flexibly, allowing for bends, thresholds, and other departures from straight lines from the beginning. Attendees will be presented with the key benefits over conventional regression tools and over a modelers' tedious search for transformations to find the right curve. All attendees will receive 6 months access to fully functional versions of the software.


CE_38T 25 Years of Cytel Exact Software: Overview of StatXact® for Non-parametric Tests with Special Application to Correlated Categorical Data
Instructors: Christopher Corcoran and Pralay Senchauduri

In celebration of the 25th anniversary of the initial release of StatXact®, we'll review the fundamentals of performing exact inference for non-parametric tests as implemented in Cytel's industry standard software. Small and sparse samples of correlated categorical data arise frequently in applied research, often where observations are clustered - also in multi-center clinical trials, epidemiologic and genetic disease studies, and in developmental toxicology experiments. In these and other difficult settings, the large-sample assumptions underlying parametric models (e.g., using random effects) or marginal models (e.g., GEE) break down. We'll present exact alternatives, including a broad approach for permutation testing providing exact correlated-data analogues for conventional tests including Fisher's exact test, the trend test, the Wilcoxon, and Kruskal-Wallis test for doubly-ordered tables. Using practical examples, appropriate use of these tests in StatXact® - and a new SAS PROC - will be demonstrated.


CE_39T Structural Equation Modeling Using the CALIS Procedure in SAS/STAT® Software: Basic and Advanced Topics
Instructors: Yiu-Fai Yung and Xinming An

The CALIS procedure in SAS/STAT software is a general structural equation modeling (SEM) tool. This workshop introduces the general methodology of SEM and applications of PROC CALIS. Background topics such as path analysis, confirmatory factor analysis, measurement error models, and linear structural relations (LISREL) are reviewed. Applications are demonstrated with examples in social, educational, behavioral, and marketing research. More advanced SEM techniques such as the full information maximum likelihood (FIML) method for treating incomplete observations, robust estimation, and diagnostics for outliers and leverage points in the SEM context are also covered. This workshop is designed for statisticians and data analysts who want an overview of SEM applications using the CALIS procedure in SAS/STAT 9.22 and later releases. Attendees should have a basic understanding of regression analysis and experience using the SAS language. Previous exposure to SEM is useful but not required. Attendees will learn how to use PROC CALIS for (1) specifying structural equation models with latent variables, (2) interpreting model fit statistics and estimation results, (3) using the FIML method for treating incomplete observations, (4) and detecting outliers and leverage points.


CE_40T Advances in Data Mining: Jerome Friedman's TreeNet/MART and Leo Breiman's Random Forests
Instructor: Mikhail Golovnya

This workshop will present Leo Breiman's Random Forests and Jerome Friedman's TreeNet/MART (also known as TreeNet Stochastic Gradient Boosting). Random Forests and MART/TreeNet are new advances to classification and regression tree software, which enable the modeler to construct predictive models of extraordinary accuracy. Random Forest is a tree-based procedure that makes use of bootstrapping and random feature generation. In TreeNet, classification and regression models are built gradually through a potentially large collection of small trees, each of which improves on its predecessors through an error-correcting strategy. I will show how the software is used to solve real-world data mining problems, cover theory and discuss what is novel in the software, cover implementation, compare the two methodologies, and show where the software fits in terms of other data mining software. All attendees will receive 6 months access to fully functional versions of the software.



Monday Roundtables and Speaker Luncheons


Tuesday Roundtables and Speaker Luncheons

TL03 Statistical Issues in Designing Non-inferiority Studies with an Emphasis on Veterinary Medical Issues
Sponsor: Biopharmaceutical Section

Anna Nevius, Food and Drug Administration/CVM


The implications of related guidance from FDA and other institutions will be discussed.

Fee for this session includes continental breakfast.



Wednesday Roundtables and Speaker Luncheons

WL09 Wanted: Statistics Classroom Examples
Sponsor: Section on Teaching of Statistics in the Health Sciences

Amy Nowacki, Cleveland Clinic


This discussion is for those educators who are tired of using the same examples year after year and interested in generating their own or discovering newly available medical examples.

Fee for this session includes continental breakfast.