Online Program

Last Name:	Abstract Keyword:	Title:

Thursday, February 23
Registration		Thu, Feb 23, 7:00 AM - 6:30 PM






SC1 Art and Practice of Classification and Regression Trees		Thu, Feb 23, 8:00 AM - 5:30 PM River Terrace 2
Instructor(s): Wei-Yin Loh, University of Wisconsin


It is more than 50 years since the first regression tree algorithm (AID, Morgan and Sonquist 1963) appeared. Rapidly increasing use of tree models among practitioners has stimulated many algorithmic advances over the last two decades. Modern tree models have higher prediction accuracy, increased computational speed, and negligible variable selection bias. They can fit linear models in the nodes using GLM, quantile, and other loss functions; response variables may be multivariate, longitudinal, or censored; and classification trees can employ linear splits and fit kernel and nearest-neighbor node models. The aims of the course are: (i) to briefly review the capabilities of the state-of-the-art methods and (ii) to show how to exploit free software to analyze data from initial data exploration to a final interpretable prediction model. Example applications include subgroup identification for precision medicine, missing value imputation, and propensity score estimation in sample surveys.


SC2 Becoming a Student of Leadership in Statistics		Thu, Feb 23, 8:00 AM - 12:00 PM City Terrace 7
Instructor(s): Matthew Gurka, University of Florida; Robert Rodriguez, SAS Institute Inc.; Gary R, Sullivan, Eli Lilly & Company


What is leadership? Much has been written and discussed within the statistics profession in the last few years on the topic and its importance in advancing our profession. This course will provide an introductory understanding of leadership as well as initial direction for statisticians who wish to develop as leaders. It will feature a leader in the statistics profession speaking on their personal journey as well as providing guidance on personal leadership development. You will also be introduced to some important leadership competencies - including influence, business acumen, and communication - and will begin to draft a plan for (1) developing your own leadership or (2) addressing a leadership challenge in your work. Finally, you will spend time reflecting on leadership learnings and networking with other statisticians and practitioners.


SC3 Peering into the Future: Introduction to Time Series Methods for Forecasting		Thu, Feb 23, 8:00 AM - 12:00 PM City Terrace 9
Instructor(s): Dave Dickey, North Carolina State University


This workshop will provide a practical guide to time series analysis and forecasting, focusing on examples and applications in modern software. Students will learn how to recognize autocorrelation when they see it and how to incorporate autocorrelation into their modeling. Models in the ARIMA class and their identification, fitting, and diagnostic testing will be emphasized and extended to models with deterministic trend functions (inputs) and ARMA errors. Diagnosing stationarity, a critical feature for proper analysis, will be demonstrated. After the course, students should be able to identify, fit, and forecast with this class of time series models and be aware of the consequences of having autocorrelated data. They should be able to recognize nonstationary cases in which the differences in the data, rather than the levels, should be analyzed. Underlying ideas and interpretation of output, rather than code, will be emphasized. No previous experience with any particular software is needed. Examples will be computed in SAS, but most modern statistical packages such as SPSS, R, STATA, etc. can be used for time series analysis.


SC4 Producing High-Quality Figures in SAS to Meet Publication Requirement, with Practical Examples		Thu, Feb 23, 8:00 AM - 12:00 PM City Terrace 12
Instructor(s): Charlie Chunhua Liu, Allergan PLC


The half day short course will cover publication requirements on high-quality figures, discus principles to produce high-quality figures in SAS, demonstrate using both SAS/GRAPH and ODS Graphics Procedures to produce some commonly used types of figures (line plots, scatter plots, bee swarm plots, box plots, and box plots overlaid with bee swarm plots etc.). The instructor will also demonstrate to produce the above mentioned high-quality figures in listing (EMF, EPS, etc.) and document formats (RTF, PDF etc.).


SC5 Linear Mixed Models Through Health Sciences Applications		Thu, Feb 23, 8:00 AM - 12:00 PM River Terrace 3
Instructor(s): Constantine Daskalakis, Thomas Jefferson University


This course will focus on the heuristic understanding of linear mixed models and their implementation (including assessment of assumptions and model fit, and interpretation of results), rather than formal statistical theory. The following general topics will be covered: a. Specification and interpretation of the fixed effects (population-averaged/mean) model. b. Specification and interpretation of the random effects and their covariance structure (subject-specific effects). c. Considerations regarding the error structure. d. Statistical and graphical methods of assessment of (a), (b), and (c), and model selection strategies. e. Determination, estimation, and testing of linear combinations/contrasts of coefficients to address scientific objectives. f. Writing brief summaries of the results for non-statistical audiences. These topics will be addressed through the analysis of data from two studies: (1) a school-based intervention program designed to impact students’ body mass index (BMI); and (2) an animal xenograft experiment designed to assess the effects of a drug and of radiotherapy on tumor growth.


SC6 Text Analytics and Its Applications		Thu, Feb 23, 1:30 PM - 5:30 PM City Terrace 7
Instructor(s): Edward Jones, Texas A&M University


Text analytics refers to the process of deriving actionable insights from text data. This half-day course explores the evolution and creative application of text analytics to solving business problems. Emphasis is placed on how text analytics is used for solving typical forecasting and classification problems by integrating structured and unstructured text data. Solutions are illustrated using SAS Text Miner, R and Python with real world applications in finance and social media.


SC7 Expressing Yourself with R		Thu, Feb 23, 1:30 PM - 5:30 PM City Terrace 9
Instructor(s): Hadley Wickham, RStudio


In this mini-workshop you'll learn how to better express yourself in R. To express yourself clearly in R you need to know how to write high quality functions and how to use a little functional programming (FP) to solve common programming challenges. You'll learn: * The three key properties of a function. * A proven strategy for writing new functions. * How to use functions to reduce duplication in your code. * How `lapply()` works and why it's so important. * A handful of FP tools that increase the clarity of your code. This workshop is suitable for beginning and intermediate R users. You need to know the basics of R (like importing your data and executing basic instructions). If you're an advanced R user, you probably won't learning anything completely new, but you will learn techniques that allow you to solve new challenges with greater ease. The workshop will be hands-on and interactive, so please make sure to bring along your laptop with R installed!


SC8 Missing Data Analysis with R/SAS/Stata		Thu, Feb 23, 1:30 PM - 5:30 PM City Terrace 12
Instructor(s): Din Chen, The University of North Carolina at Chapel Hill; Frank Liu, Merck Research Labs


Missing data are near universal in applied research. Almost all applied researchers have faced the problems of missing data at some point. However, not all the researchers assessed missingness or used appropriate ways to deal with the missing data. Instead, researchers often drop the missing values (e.g., listwise deletion), which reduces the sample size, lowers statistical power, or use ad-hoc single imputation such as LOCF for simplicity. Both approaches introduce the possibility of biased parameter estimations. Such inefficient and potentially biased statistical inference would lead to erroneous research conclusions. This short course aims to address the problems of missing data. The concept of different missing data mechanisms or typologies including missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) will be discussed with illustration from real clinical trial examples. Moreover, this short course will introduce how to conduct two commonly-used model-based methods for missing data analysis including the multiple imputation (Little & Rubin, 2002; Reiter & Raghunathan, 2007) and maximum likelihood (Allison, 2012) using R/SAS. Some sensitivity analysis approaches to handle missing data under MNAR will also be discussed briefly.


SC9 Bootstrap Methods and Permutation Tests		Thu, Feb 23, 1:30 PM - 5:30 PM River Terrace 3
Instructor(s): Tim Hesterberg, Google


We begin with a graphical approach to bootstrapping and permutation testing, illuminating basic statistical concepts of standard errors, confidence intervals, p-values and significance tests. We consider a variety of statistics (mean, trimmed mean, regression, etc.), and a number of sampling situations (one-sample, two-sample, stratified, finite-population), stressing the common techniques that apply in these situations. We'll look at applications from a variety of fields, including telecommunications, finance, and biopharm. These methods let us do confidence intervals and hypothesis tests when formulas are not available. This lets us do better statistics, e.g. use robust methods (we can use a median or trimmed mean instead of a mean, for example). They can help clients understand statistical variability. And some of the methods are more accurate than standard methods.


PS1 Poster Session 1 and Opening Mixer		Thu, Feb 23, 5:30 PM - 7:00 PM Conference Center AB


Chair(s): Nancy Wang, Celerion

	Web Scraping Government Tax Revenue with Machine Learning Brian Arthur Dumbacher, U.S. Census Bureau
	A Hierarchical Clustering Analysis (HCA) in Automatic Driving Regarding Vehicle-to-Vehicle Pedestrian Position Identification Jie Xue, Purdue University
	Communicating Statistics to Nonstatisticians Kim Love, K. R. Love Quantitative Consulting and Collaboration
	Good Statistical Practices: An Example of Meta-Analysis of Odds Ratios Bei-Hung Chang, University of Massachusetts Medical School
	A New Method to Assess Measurement Agreement in Machine Readings Dong-Yun Kim, NHLBI/NIH
	Smoking Tendencies Among Junior High--School Students in Ghana: Applications of ROC Curve and AUC Emmanuel Thompson, Southeast Missouri State University
	An Application of Competing Risk Analysis in Large Cardiovascular Clinical Trials Purva Jain, Beth Israel Deaconess Medical Center, Harvard Medical School
	Profile Monitoring for Poisson Data with Fixed Effects Using Nonparametric Methods Sepehr Piri, Virginia Commonwealth University
	Interaction of Measurement Burden and Disease in the ENRICHD Clinical Trial Su-Yun Han, NHLBI/NIH
	Probabilistic Record Linkage in R and Stata Anders R Alexandersson, Florida Cancer Data System
	Creating Reproducible Tables in R Markdown Claire Palmer, University of Colorado at Denver School of Medicine
	More Than Meets the Eye: Bayesian Inference in Nonparanormal Graphical Models Jami Jackson Mulgrave, North Carolina State University
	Communicating with Clinicians About Models That Predict Risk Using Interactive Web Graphics Marshall Brown, Fred Hutchinson Cancer Research Center
	R Resample Package Tim Hesterberg, Google
	StatTag: A Reproducible Research Tool for Generating Dynamic Documents Using Microsoft Word Abigail S Baldridge, Northwestern University


Exhibits Open		Thu, Feb 23, 5:30 PM - 7:00 PM Conference Center AB






Friday, February 24
Registration		Fri, Feb 24, 7:30 AM - 5:30 PM






Continental Breakfast		Fri, Feb 24, 7:30 AM - 8:30 AM Conference Center AB






Exhibits Open		Fri, Feb 24, 7:30 AM - 6:30 PM Conference Center AB






GS1 Keynote Address		Fri, Feb 24, 8:00 AM - 9:00 AM River Terrace 1


Chair(s): MoonJung Cho, Bureau of Labor Statistics

	Snakes and Ladders: Challenges in Forging a Career in Statistics David Lane Banks, Duke University


CS01 Presentation and Storytelling		Fri, Feb 24, 9:15 AM - 10:45 AM River Terrace 2


Chair(s): Cynthia R. Long, Palmer College of Chiropractic

9:20 AM	The Statistician’s Role in Data Storytelling Projects: Case Studies and Best Practices Haviland Wright, Boston University
10:05 AM	Statistical Presentation Power: How to Reveal Your 'X Factor'!!! Jennifer H Van Mullekom, Laboratory for Interdisciplinary Statistical Analysis (LISA), Virginia Tech


CS02 Beyond the Basics: Advanced Modeling Methods		Fri, Feb 24, 9:15 AM - 10:45 AM River Terrace 3


Chair(s): Shankang Qu, PepsiCo

9:20 AM	Don't Be Silly; Do It Bayesian Perceval Sondag, Arlenda
10:05 AM	Improve Regression and Communicate Results Using Stochastic Gradient Boosting and LASSO Charles William Harrison, Salford Systems


CS03 Data Wrangling and Visualization		Fri, Feb 24, 9:15 AM - 10:45 AM City Terrace 7


Chair(s): Robert P. Yerex, University of Virginia Medical Center

9:20 AM	Discover and Visualize the Golden Paths, Unique Sequences, and Marvelous Associations Out of Your Big Data Using Link Analysis in SAS Enterprise Miner Delali Agbenyegah, Alliance Data Card Services
10:05 AM	Data Scraping, Parsing, Wrangling, and Cleaning Mark Daniel Ward, Purdue University


CS04 Keep It Simple with R		Fri, Feb 24, 9:15 AM - 10:45 AM City Terrace 9




9:20 AM	Reproducibility in Action Richard Thomas Schwinn, U.S. Small Business Administration
10:05 AM	Managing Many Models Hadley Wickham, RStudio


CS05 Statistical Collaboration		Fri, Feb 24, 11:00 AM - 12:30 PM River Terrace 2


Chair(s): Michael Latta, YTMBA Research & Consulting and Coastal Carolina University

11:05 AM	Practical Examples and Challenges of Statistical Consulting in Health Settings Laura H Gunn, Stetson University
11:50 AM	Panel Discussion on Statistical Volunteers David J Corliss, Peace-Work


CS06 Business Intelligence Practices		Fri, Feb 24, 11:00 AM - 12:30 PM River Terrace 3


Chair(s): Madhuri Mulekar, University of South Alabama

11:05 AM	On the Street: Conducting Business Research Joyce Nilsson Orsini, Fordham University GBA
11:50 AM	Bridging the Gap on Multi-Channel Attribution John Lin, Epsilon Data Management


CS07 Surveys and Sentiment Analysis		Fri, Feb 24, 11:00 AM - 12:30 PM City Terrace 7


Chair(s): Susan Simmons, NC State Institute for Advanced Analytics

11:05 AM	The Nexus Between Data Science, Survey Design, and Statistical Practice Steven B Cohen, RTI International
11:50 AM	Sentiment Analysis of Brand Social Mentions: The Polarity Classification and Beyond Jin Su, Johnson & Johnson Vision Care, Inc.


CS08 Interactivity with R Shiny		Fri, Feb 24, 11:00 AM - 12:30 PM City Terrace 9


Chair(s): Edward Mulrow, NORC at the University of Chicago

11:05 AM	Working with Shiny Things Harlen Hays, Cerner Corporation
11:50 AM	Rapid Data Visualization and Dissemination Using R and Shiny Bogdan Alexandru Rau, UCLA Center for Health Policy Research


Lunch (on own)		Fri, Feb 24, 12:30 PM - 2:00 PM






CS09 Organizational Impact		Fri, Feb 24, 2:00 PM - 3:30 PM River Terrace 2


Chair(s): Kathy Hanford, University of Nebraska-Lincoln

2:05 PM	Developing a Data Science Center of Excellence (DS CoE) Celeste R Fralick, Unaffiliated
2:50 PM	My Marathon Journey for Analytics Change Terri Henderson, Johnson & Johnson Vision Care, Inc.


CS10 Probability Distributions		Fri, Feb 24, 2:00 PM - 3:30 PM River Terrace 3


Chair(s): Kathleen Jablonski, The George Washington University

2:05 PM	Modeling Proportions and Probabilities: The Beta Distribution Is Your Friend Paul Teetor, William Blair & Co.
2:50 PM	Probability Density for Repeated Events Bruce Stephen Lund, Magnify Analytic Solutions


CS11 Text Analytics		Fri, Feb 24, 2:00 PM - 3:30 PM City Terrace 7


Chair(s): Laura H Gunn, Stetson University

2:05 PM	Predicting Regulatory Risk from Unstructured Text Data Danielle Leigh Boree, Johnson & Johnson Vision Care, Inc.
2:50 PM	Using Text Analytics and Signal Detection to Predict Medical Device Recalls Lisa Ensign, Significant Statistics


CS12 Going Big on Bayesian		Fri, Feb 24, 2:00 PM - 3:30 PM City Terrace 9


Chair(s): Alok Kumar Dwivedi, Texas Tech University Health Sciences Center

2:05 PM	Introduction to Bayesian Analysis Using Stata Chuck Huber, StataCorp
2:50 PM	Bayesian Structural Equation Modeling M'hamed Hamy Temkit, Mayo Clinic


CS13 Career and Personal Development		Fri, Feb 24, 3:45 PM - 5:15 PM River Terrace 2


Chair(s): Rich Newman, Johnson & Johnson Vision Care, Inc.

3:50 PM	Soft Skills for Succeeding Outside of Academia Diahanna L Post, Nielsen
4:35 PM	Career Development for Statisticians in a Collaborative Environment: Importance of Effective Mentoring and Development of Soft Skills Jay N Mandrekar, Mayo Clinic


CS14 Addressing Statistical Problems and Issues		Fri, Feb 24, 3:45 PM - 5:15 PM River Terrace 3


Chair(s): Bonita Singal, United States Department of Energy

3:50 PM	Data Preparation: The Key for Meaningful Insights Huiyu Qian, AutoAnything Inc.
4:35 PM	Matched Case-Control Data Analysis Yinghui Duan, Connecticut Institute for Clinical and Translational Science


CS15 Machine Learning		Fri, Feb 24, 3:45 PM - 5:15 PM City Terrace 7


Chair(s): John Stevens, Utah State University

3:50 PM	Tree-Based Techniques for High-Dimensional Data Wei-Yin Loh, University of Wisconsin
4:35 PM	Intro to Deep Learning with TensorFlow Denisa A.O. Roberts, ASAPP Inc.


CS16 Generalized Linear Mixed Models with R		Fri, Feb 24, 3:45 PM - 5:15 PM City Terrace 9


Chair(s): Doug Lehmann, University of Miami

3:50 PM	Constructing and Analyzing Generalized Linear Mixed Models Christina P Knudson, Macalester College
4:35 PM	Simulation and Power Analysis of Generalized Linear Mixed Models Brandon LeBeau, University of Iowa


PS2 Poster Session 2 and Refreshments		Fri, Feb 24, 5:15 PM - 6:30 PM Conference Center AB


Chair(s): Huanjun Zhang, Texas A&M University

	Use of Longitudinal Models to Identify Subject-Specific Implausible Body Mass Index Measures: A Comparison with Screening for Population-Level Outliers Carrie Tillotson, OCHIN, Inc.
	Enhancing Monthly Retail Holiday Effect Methodology Through Daily Data Rebecca Jean Hutchinson, U.S. Census Bureau
	Statistics for Public Policy: Reflecting on Change in the Last 50 Years Karen Moran Jackson, The University of Texas at Austin
	Data Science, Statistics, Analytics, Data Engineering: What Does It All Mean? Michael Latta, Coastal Carolina University
	Iterative Semiparametric Generalized Linear Models Busayasachee Puang-Ngern, Macquarie University
	Expanding the Appeal of Model Selection Using Mixture Priors to Incorporate Expert Opinion: A Behavioral Economic Case Study Christopher T Franck, Virginia Tech
	Performance of Data Mining Methods in an Example with Ordinal and Imbalanced Data Elena Rantou, FDA
	Additive P-Value Combinations and an Application in Consumer Product Research Georgette Asherman, Direct Effects, LLC
	Pseudo-Maximum Likelihood Estimation with Sampling Weight for Modeling Count Data from a Complex Survey Lin Dai, Medical University of South Carolina
	Two-Step Logistic Regression Model for Predicting Phone Campaign Response Sharon (Renting) Xu, AARP, Inc.
	Detecting Interaction in Two-Way Unreplicated Experiments via Bayesian Model Selection Thomas Anthony Metzger, Virginia Tech
	Analyzing Shot Data with MANOVA Victoria Cox, Dstl
	What Effort Is Needed in Explaining Statistical Results to Pediatric Researchers? A Survey to Better Understand the Confidence and Knowledge of Pediatric Researchers Curtis Dean Travers, Emory University School of Medicine
	Rapid Data Visualization and Dissemination Using R and Shiny Bogdan Alexandru Rau, UCLA Center for Health Policy Research
	ShinySurvival: An Interactive Tool for Visualizing and Analyzing Survival Data in R Felicia Powell Hardnett, CDC
	Side-by-Side Bar Charts for More Than One Variable on Different Scales Using SAS SGPLOT John Stephen Taylor, Johnson & Johnson Vision Care, Inc.
	Good Old Excel: Using an Old Favorite to Explore, Visualize, and Share Data Nola du Toit, NORC at the University of Chicago


Saturday, February 25
Registration		Sat, Feb 25, 7:30 AM - 2:30 PM






Exhibits Open		Sat, Feb 25, 7:30 AM - 1:00 PM Conference Center AB






PS3 Poster Session 3 and Continental Breakfast		Sat, Feb 25, 8:00 AM - 9:15 AM Conference Center AB


Chair(s): Michael Devin Floyd, Saint Software

	Variations in Statistical Practice Between North-American Stat Labs Eric Vance, University of Colorado at Boulder
	Not Just a Statistician: Experience on How to Communicate with Your Client Kate Wan-Chu Chang, University of Michigan
	Listwise Deletion or Multiple Imputation When Complex Sample Data Are MCAR or MAR: A Guide to Selecting an Appropriate Missing Data Treatment Method Anh P. Kellermann, University of South Florida
	Analysis of Bird Arrival Dates in Cayuga County Caitlin Mary Cunningham, Le Moyne College
	A Guide to Modeling Strategies for Immunological Count Data Claire Palmer, University of Colorado at Denver School of Medicine
	Blending Big Data Visualization Tools with Statistical Analysis: Improving Automotive Lubricants Jim McAllister, Afton Chemical Corporation
	Partial Least Squares Regression Analysis Identifies Interleukin-1 Receptor as a Predictor of Airway Neutrophils in Asthma Michael David Evans, University of Wisconsin-Madison
	Data Fusion Techniques for Estimating the Relative Abundance of Rare Species Purna Gamage, Texas Tech University
	Statistical Comparison of Particle Size Distributions Scott J Richter, The University of North Carolina at Greensboro
	Optimal Experimental Designs for Mixed Categorical and Continuous Responses Soohyun Kim, Arizona State University
	Balanced Salary Structure Modeling Thor Dane Osborn, Sandia National Laboratories
	Sequential Pattern Mining in Real-Time Marketing with Backward Match Algorithm Yi Cao, Alliance Data Card Services
	Using Shiny to Efficiently Process Survey Data Carl Ganz, UCLA Center for Health Policy Research
	Integrating R Programming Platforms into Community Collective Impact Efforts to Solve Social Problems Frank M Ridzi, Central New York Community Foundation and Le Moyne College
	Generating Tables and Statistical Summary Using PROC REPORT Lei Zhang, University of Minnesota
	A Nonlinear Regression Plugin for Rcmdr Thomas Edward Burk, University of Minnesota
	Predict Warriors' 73-Win on April 13, 2016 Jason Li, Morrill Learning Center


CS17 Ethical Guidelines		Sat, Feb 25, 9:15 AM - 10:45 AM River Terrace 2


Chair(s): Constantine Daskalakis, Thomas Jefferson University

9:20 AM	How to Deal with Ethical Issues of Human Subjects, Difficult Colleagues, and Networking Michael Latta, YTMBA Research & Consulting and Coastal Carolina University
10:05 AM	How the New ASA Guidelines Help Practicing Statisticians Alan C. Elliott, Southern Methodist University


CS18 Going Mainstream: Emerging Modeling Methods		Sat, Feb 25, 9:15 AM - 10:45 AM River Terrace 3


Chair(s): Viswanathan Ramakrishnan, Medical University of South Carolina

9:20 AM	Amazon Product Co-Purchasing Network Estimation Through ERGM Model Using Reference Prior Sayan Chakraborty, Michigan State University
10:05 AM	Integrating Text Analytics with Traditional Structured Analytics Edward Jones, Texas A&M University


CS19 Guided and Automatic Model Selection		Sat, Feb 25, 9:15 AM - 10:45 AM City Terrace 7


Chair(s): Inyoung Kim, Virginia Tech

9:20 AM	Best Practices in Model Selection and Profiling Scott Lee Wise, SAS Institute, Inc.
10:05 AM	Designing Automated Workflows for Model Selection and Optimization Christian Kendall, Salford Systems


CS20 A Graph Is Worth a Thousand Words		Sat, Feb 25, 9:15 AM - 10:45 AM City Terrace 9


Chair(s): Andrew D. Althouse, University of Pittsburgh Medical Center

9:20 AM	Geospatial Analysis with R Michael Jadoo, Unaffiliated
10:05 AM	How to Avoid Some Common Graphical Mistakes Naomi B. Robbins, NBR


CS21 Communicating to Motivate and Influence		Sat, Feb 25, 11:00 AM - 12:30 PM River Terrace 2


Chair(s): Yuanyuan Tang, Saint Luke's Health System

11:05 AM	The Psychology of Influence Colleen Mangeot, Cincinnati Children's Hospital Medical Center
11:50 AM	Strategic Marketing and Communication for Statistical Consultants and Collaborators Renita Canady, Association of American Medical Colleges


CS22 In THIS Corner: X1! When Model Variables Compete		Sat, Feb 25, 11:00 AM - 12:30 PM River Terrace 3


Chair(s): Michael Reiger, West Virginia University

11:05 AM	Estimating with Weights: Common Sense vs. Unbiasedness Tim Hesterberg, Google
11:50 AM	Competing Risk Data and Semi-Competing Risk Data Analysis and Visualization in SAS and R Ran Liao, Indiana University


CS23 Latent Variable and Mixed Effects Models		Sat, Feb 25, 11:00 AM - 12:30 PM City Terrace 7


Chair(s): Xianggui (Harvey) Qu, Oakland University

11:05 AM	Nonparametric Mixed-Effects Regression for Large Samples Nathaniel Erik Helwig, University of Minnesota
11:50 AM	Optimization of Processes and Products from Historical (Un-Designed) Data John F. MacGregor, ProSensus, Inc.


CS24 It's a Package Deal		Sat, Feb 25, 11:00 AM - 12:30 PM City Terrace 9


Chair(s): John Castelloe, SAS Institute

11:05 AM	Introduction to JMP Software Terrie Vasilopoulos, University of Florida
11:50 AM	Logistic Regression Cross-Package Comparison Lillian Ma, Capital One Bank


Lunch (on own)		Sat, Feb 25, 12:30 PM - 2:00 PM






PCD1 Power and Sample Size Analysis Using Stata		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 6
Instructor(s): Chuck Huber, StataCorp


Power and sample size analysis is a fundamental step in the planning of any research project. This talk will demonstrate how to use Stata's power command to calculate power, sample size and minimum detectable effect size. We will show how to create customized tables and graphs for many study designs with both continuous and categorical outcomes. We will also demonstrate how to add your own methods to the power command and how to calculate power for multilevel/longitudinal studies using simulation.


PCD2 Xymp: A Web Application Supporting Best Practices in Bioassay		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 8
Instructor(s): David Lansky, Precision Bioassay, Inc.


The software system consists of three components (each on a different virtual server): a web application (written in PHP), a database, and a collection of R programs, packages, and reports (sweave and knitr). The system helps users perform randomized instances of routine bioassays, performs mixed model analyses (using linear or non-linear models), produces reports (including summaries). The statistical portion works well with simple or complex designs (from CRD to a strip-unit). The system contains a lot of features to meet regulatory requirements (users with different levels of authorization, automatic tracking and reporting of re-analyses of data, etc.). The system is designed to be very easy for routine use in the lab, while providing a rich collection of modern statistical capabilities. The system is designed to facilitate good collaboration between bioassay scientists and statisticians. Each assay has a protocol, each analysis has a protocol, the protocols capture all the statistical details; the lab users select protocols by name. The statisticians build the protocols.


PCD3 Marketing Mix Modeling and Optimization Using Bayesian Networks and BayesiaLab		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 12
Instructor(s): Stefan Conrady, Bayesia USA


“Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” Over the last century, various versions of this quote have been attributed to John Wanamaker, Henry Ford, and Henry Procter, among others. Yet, 100 years after these marketing pioneers, in this day and age of big data and advanced analytics, the quote still rings true among marketing executives. The ideal composition of advertising and marketing efforts remains the industry's Holy Grail. The current practice remains “more art than science.” The lack of a well-established marketing mix methodology has little to do with the domain itself. Rather, it reflects the fact that marketing is yet another domain that typically has to rely on non-experimental data for decision support. The single most important thing we need to recognize about marketing mix modeling is that it is a causal question. This means, we are not looking for a prediction of an outcome variable based on the observation of marketing variables. Rather, we are looking to manipulate marketing variables to optimize an outcome variable. Thus, we are performing an intervention, which requires us to perform causal inference. This leads us to the Holy Grail of statistics, i.e. causal inference from observational data. In this workshop, we introduce the basic concepts of graphical models and how they can help us perform causal identification, e.g. using causal assumptions and the well-known Adjustment Criterion. While this is straightforward in theory, the complexity of the marketing domain prevents the practical application of this criterion. Thus, we introduce a new criterion (Shpitser and VanderWeele, 2011) that reduces the number of assumptions that we require for confounder selection and causal identification. Implementation with BayesiaLab With the confounders identified, we can now build a high-dimensional statistical model that represents the joint probability distribution of all marketing variables. We do that using the machine-learning algorithms of the BayesiaLab software platform. We obtain a Bayesian network that represents a multitude of relationships between all marketing variables and the outcome variable. Using BayesiaLab’s visualization functions, we can compare the machine-learned graph to our understanding of the domain. Furthermore, we can examine the (mostly nonlinear) response curves of the outcome variable as a function of the marketing variables. Most importantly, we use BayesiaLab to perform Likelihood Matching on all confounders to establish the causal response of the outcome variable. With all causal response curves computed, we introduce cost functions for the marketing variables via BayesiaLab’s Function Node. On that basis, we proceed to BayesiaLab’s Target Optimization function, which, by means of a genetic algorithm, searches for an optimal combination of all marketing variables, while being subject to constraints of individual variables and an overall marketing budget constraint. The optimization report shows feasible solutions along with the degree of achievement.


PCD4 Dig Deeper and Uncover the Unexpected with JMP 13 and JMP Pro 13		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 10
Instructor(s): Mia Stephens, SAS Institute, Inc.; Scott Lee Wise, SAS Institute, Inc.


This session will cover how we can meet the challenge to explore, model and experiment on complex data analytic needs by: • Increasing Ease and Efficiency of Preparing and Accessing Data • Handling and Exploring all Types of Data, including Text • Providing Next Generation Analytical Tools in Quality, DOE & Reliability • Unleashing Advanced Analytics in Predictive Modeling • Improving the Ways to Share and Report Out Analytics and Graphs We will feature new ground-breaking methodology on relevant demos to maximize participant learning.


T1 Understanding and Working with Different (and Sometimes Difficult) People		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 7
Instructor(s): Colleen Mangeot, Cincinnati Children's Hospital Medical Center


Do you have coworkers, researchers, or clients that are difficult to work with? Do you feel frustrated and/or confused about how to work with them? Do you wonder sometimes why they just don’t get it? This session will introduce the DISC model for understanding and working with different and sometimes difficult people. It will involve case studies and examples. The result? Improved relationships, increased effectiveness, greater influence, and ability to motivate others.


T2 Penalized Regression Methods for Generalized Linear Models in SAS/STAT		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 9
Instructor(s): G Gordon Brown, SAS Institute, Inc.


Regression problems that have large numbers of candidate predictor variables occur in a wide variety of scientific fields and in business. These problems require you to perform statistical model selection to find an optimum model that is simple and has good predictive performance. For linear and generalized linear models you will see how to use the forward, backward, stepwise, and LASSO methods of variable selection. This tutorial presents modern variable selection methods for linear models using the adaptive LASSO, group LASSO, and elastic net penalized regression techniques, plus various screening methods. Penalized regression techniques yield a sequence of models and require at least one tuning method to choose the optimum model that has the minimum estimated prediction error. You will learn how to use fit criteria (such as AIC, SBC, and the Cp statistic), average square error on the validation data, and cross validation as tuning methods for penalized regression. Various examples will be provided using the GLMSELECT and HPGENSELECT procedures of SAS/STAT, which offer extensive customization options and powerful graphs for performing statistical model selection.


T3 Introduction to Spatial Analysis Through Statistics		Sat, Feb 25, 2:00 PM - 4:00 PM River Terrace 3
Instructor(s): Michael Devin Floyd, Saint Software; Phillip Stedman Floyd, Segal Consulting


The word spatial means related to space or geography. Thus, spatial analysis is an analysis that takes into consideration the location of the observation. This course is about using spatial elements to derive conclusions or eliminate dependence based on location. This course assumes no prior knowledge of spatial analysis. It starts from the beginning by defining spatial data and reasoning why spatial analysis is relevant. Different mapping techniques are explored to visualize spatial information to get a better understanding. Test for spatial dependence in datasets are discussed. Then, it is shown that spatial dependence can influence results. Spatial regression techniques are discussed to mitigate the spatial dependence. For the conclusion, I talk about how accounting for the spatial dependence influenced the research I did at the Louisiana Public Health Institute. Basic knowledge of regression and linear modeling is assumed.


T4 How to Find (the Right) Clients for Your Independent Statistical Consulting Business		Sat, Feb 25, 2:00 PM - 4:00 PM River Terrace 2
Instructor(s): Karen Grace-Martin, The Analysis Factor


If you are starting an independent statistical consulting business, you will need to learn many business skills. The most important, yet intimidating, of these is finding and attracting clients. Clients will hire (and re-hire) you only if they know, like, and trust you. This will only happen when you build a solid marketing process that conveys your strengths and what you can offer to the right clients. In this tutorial, you will learn about how to approach and get started creating a simple, yet solid, marketing plan that allows the right clients to know, like, and trust you. The instructor will share her personal experiences and case studies of colleagues who built a consulting business and guide you through small group exercises.


GS2 Closing General Session		Sat, Feb 25, 4:15 PM - 5:30 PM River Terrace 2



The Closing Session is your opportunity to interact with the CSP Steering Committee in an open discussion about how the conference went. CSPSC vice chair, Jean Adams, will lead a panel of committee members as they summarize their conference experience. The audience will then be invited to ask questions and provide feedback. The committee highly values suggestions for improvements gathered during this time. The best student poster will also be awarded during the Closing Session, and each attendee will have an opportunity to win a door prize.

CSP 2017 Online Program

Online Program

ASA Meetings Department

Share