CSP 2017 Online Program

Last Name:	Abstract Keyword:	Title:

Thursday, February 23
Registration		Thu, Feb 23, 7:00 AM - 6:30 PM






SC1 Art and Practice of Classification and Regression Trees		Thu, Feb 23, 8:00 AM - 5:30 PM River Terrace 2
Instructor(s): Wei-Yin Loh, University of Wisconsin Download Handouts


It is more than 50 years since the first regression tree algorithm (AID, Morgan and Sonquist 1963) appeared. Rapidly increasing use of tree models among practitioners has stimulated many algorithmic advances over the last two decades. Modern tree models have higher prediction accuracy, increased computational speed, and negligible variable selection bias. They can fit linear models in the nodes using GLM, quantile, and other loss functions; response variables may be multivariate, longitudinal, or censored; and classification trees can employ linear splits and fit kernel and nearest-neighbor node models. The aims of the course are: (i) to briefly review the capabilities of the state-of-the-art methods and (ii) to show how to exploit free software to analyze data from initial data exploration to a final interpretable prediction model. Example applications include subgroup identification for precision medicine, missing value imputation, and propensity score estimation in sample surveys.
Outline & Objectives Outline: 1. Review of classification trees. Comparisons of algorithms on prediction accuracy, computation, and selection bias. 2. Review of regression trees for least squares, quantile, Poisson, and relative risk regression. Effect of collinearity, nonlinearity, variance heterogeneity, and missing data. 3. Importance scoring of variables. 4. Inference for tree models. Bootstrap trees and confidence intervals. 5. Tree ensembles. 6. Step-by-step analysis of real data using free software, from data exploration to final model. Examples include: (a) Subgroup identification for precision medicine in a breast cancer trial with censored response. (b) Subgroup identification in a diabetes trial with longitudinal responses. (c) Missing value imputation and propensity score estimation for the U.S. Consumer Expenditure Survey. (d) Analysis of data on circuit board soldering from a factorial design. (e) Joint modeling of mother's stress and child's morbidity in a longitudinal study. Objectives: 1. Reveal the power and versatility of tree models. 2. Show how to exploit advanced features of existing software. About the Instructor Wei-Yin Loh has been actively doing research in the subject for almost thirty years. He is the developer or co-developer of the FACT, QUEST, CRUISE, LOTUS and GUIDE algorithms and has supervised more than twenty PhD theses in this area. He has given one and two-day courses on classification and regression trees to professional societies (KDD 1999, 2001; JSM 2007, 2011, 2013, 2015; U.S. Army Applied Statistics Conference 1995, 1999; Interface Conference 2013; ASA Northeastern Illinois Chapter 2014; ICSA Applied Statistics Symposium 2015; Midwest Biopharmaceutical Statistics Workshop 2015; Washington Statistical Society 2015), major biopharmaceutical companies, and overseas universities (National University of Singapore 2010 and 2014; East China Normal University 2012; National Tsinghua University, Taiwan, 2012; City University of Hong Kong 2014). He is a consultant on regression tree methods to government and industry. He regularly teaches a semester-long graduate course on the subject at the University of Wisconsin, Madison. Relevance to Conference Goals The course will introduce to beginners a new set of statistical tools that is time-tested and is used increasingly in academia and industry. It will teach non-beginners how to exploit the features of existing software, such as their use in simulation experiments. In addition, the course will teach attendees how to respond to questions about tree models, such as their interpretation and statistical significance, and how they compare with models from traditional methods in terms of prediction accuracy, underlying assumptions, and sensitivity to outliers, collinearity, and missing values. The course will have an immediate positive impact on a statistician's job by increasing his/her array of tools. Software Packages GUIDE: http://www.stat.wisc.edu/~loh/guide.html R packages: rpart, randomForest, party


SC2 Becoming a Student of Leadership in Statistics		Thu, Feb 23, 8:00 AM - 12:00 PM City Terrace 7
Instructor(s): Matthew Gurka, University of Florida; Robert Rodriguez, SAS Institute Inc.; Gary R, Sullivan, Eli Lilly & Company Download Handouts


What is leadership? Much has been written and discussed within the statistics profession in the last few years on the topic and its importance in advancing our profession. This course will provide an introductory understanding of leadership as well as initial direction for statisticians who wish to develop as leaders. It will feature a leader in the statistics profession speaking on their personal journey as well as providing guidance on personal leadership development. You will also be introduced to some important leadership competencies - including influence, business acumen, and communication - and will begin to draft a plan for (1) developing your own leadership or (2) addressing a leadership challenge in your work. Finally, you will spend time reflecting on leadership learnings and networking with other statisticians and practitioners.
Outline & Objectives 1. Gain a better understanding of leadership, including: - How established leaders in our profession have developed their leadership skills; - Insights and perspectives on leadership from other professional statisticians gained through interactions, discussions and group work that will improve each attendee’s ability to lead - Inspiration to become a leader 2. Improved ability to recognize and develop characteristics of leaders, including: - Ideas on how to acquire greater knowledge in organizational/business acumen - Insights into the importance communication plays in leadership; - Other qualities that impact a leader’s ability to influence; 3. Develop a path for your own leadership development, including: - A draft of your own leadership principles and a plan for continuing your leadership journey; - Membership in a small peer leadership group that will continue to share learning, perspectives, experiences and ideas on leadership after completing the workshop About the Instructor Gary R. Sullivan, Ph.D., is the senior director of Non-Clinical Statistics at Eli Lilly and Company, a major pharmaceutical manufacturer headquartered in Indianapolis, IN. Gary joined Lilly in 1989 and has 13 years of experience as a project statistician where he collaborated with pharmaceutical formulators, chemists, biologists, and engineers on formulation design, process optimization & modeling, assay development & characterization, and production monitoring. He has spent the last 14 years in various leadership positions with responsibilities for statisticians collaborating in manufacturing, product development, discovery research, and biomarker research. His personal passions include quality management, experimental design, process optimization and leadership development. Gary has developed leadership training for the Biometrics organization at Eli Lilly, proposed and facilitated the initial JSM leadership course in 2014, and is the current chair for the ASA Ad Hoc Leadership Committee. Gary received his Ph.D. in statistics in 1989 from Iowa State University working with Dr. Wayne A. Fuller. Matthew J. Gurka, Ph.D., is a Professor in the Department of Health Outcomes and Policy at the University of Florida (UF), where he is also the Associate Director of the Institute for Child Health Policy. Prior to his recent appointment at UF, Matthew was the Founding Chair of the Department of Biostatistics in the School of Public Health at West Virginia University, where he also led the Clinical Research Design, Epidemiology, and Biostatistics Program of the West Virginia Clinical and Translational Science Institute (WVCTSI). Matthew received a Ph.D. in biostatistics in 2004 from the University of North Carolina at Chapel Hill. In addition to his interests in mixed models and power analysis, Matthew has extensive collaborative and independent research experience in pediatrics. Recently he has focused on childhood and adult obesity, where he has obtained PI funding from the NIH to develop and validate improved measures of the metabolic syndrome. He recently completed a term on the Executive Editorial Board of the journal Pediatrics and is currently on the Editorial Board of the Journal of Pediatrics. Matthew has participated in the JSM leadership course since 2014, and is a member of the ASA Ad Hoc Leadership Committee. Relevance to Conference Goals The above objectives of this course align with the goal of the Conference on Statistical Practice to “provide opportunities for attendees to further their career development and strengthen relationships in the statistical community.” Understanding leadership and learning ways to develop leadership skills further are important for the continued development of one’s career. In addition, this course provides the opportunity to build peer relationships among fellow participants who share an interest in leadership. Software Packages NA


SC3 Peering into the Future: Introduction to Time Series Methods for Forecasting		Thu, Feb 23, 8:00 AM - 12:00 PM City Terrace 9
Instructor(s): Dave Dickey, North Carolina State University Download Handouts


This workshop will provide a practical guide to time series analysis and forecasting, focusing on examples and applications in modern software. Students will learn how to recognize autocorrelation when they see it and how to incorporate autocorrelation into their modeling. Models in the ARIMA class and their identification, fitting, and diagnostic testing will be emphasized and extended to models with deterministic trend functions (inputs) and ARMA errors. Diagnosing stationarity, a critical feature for proper analysis, will be demonstrated. After the course, students should be able to identify, fit, and forecast with this class of time series models and be aware of the consequences of having autocorrelated data. They should be able to recognize nonstationary cases in which the differences in the data, rather than the levels, should be analyzed. Underlying ideas and interpretation of output, rather than code, will be emphasized. No previous experience with any particular software is needed. Examples will be computed in SAS, but most modern statistical packages such as SPSS, R, STATA, etc. can be used for time series analysis.
Outline & Objectives Outline of course topics: (1)Identifying and fitting ARMA models, (2)Diagnostics, (3)Incorporating inputs: Regression with Time Series Errors, (4)Intervention Analysis, (5)Nonstationarity: Unit Roots and Stochastic Trends, (Optional: Seasonal models time permitting) Benefits of the course include an understanding of new issues encountered when data are taken over time and how to deal with these issues. Not only are new techniques of analysis necessary, which the student will learn, but additional terminology arises in these cases. Examples and practical interpretation along with the strengths and weaknesses of competing forecasting methodologies will be emphasized. I hope to give examples of interesting data analyses that can be used as templates for analyzing the participants' data when they return home. About the Instructor David A. Dickey received his PhD in statistics in 1976 from Iowa State University working with Dr. Wayne A. Fuller. Their “Dickey-Fuller” test is a part of most modern time series software packages. He is on the ISI’s list of highly cited researchers and is an ASA Fellow. Dickey is William Neal Reynolds Professor of Statistics at North Carolina State University where he does time series research, teaches graduate level methods courses, does consulting, and mentors graduate students. He is coauthor of several books on statistics, including “The SAS System for Forecasting Time Series,” a publication of SAS Institute. He has presented at many conferences including the 2013 ASA Conference on Statistical Practice and several JSM sessions. He has been a contact instructor for SAS Institute since 1981 teaching courses in statistical methodology, including time series, and has helped write some of their course notes. Recently Dickey has been teaching for NC State University's Institute for Advanced Analytics which offers an intensive applied Master’s degree in a 9 month cohort program. He has appointments in Economics and the NCSU Financial Math program. Relevance to Conference Goals The student will be better able to communicate intelligently with clients having data taken over time by learning the terms and the concepts behind them. The benefits of being able to better forecast what is going to happen next should be of obvious value to any company collecting data over time. The successful student should be able to carry out an analysis of time dependent data from model identification, through fitting and diagnostic checking, all the way to producing forecasts. Software Packages


SC4 Producing High-Quality Figures in SAS to Meet Publication Requirement, with Practical Examples		Thu, Feb 23, 8:00 AM - 12:00 PM City Terrace 12
Instructor(s): Charlie Chunhua Liu, Allergan PLC Download Handouts


The half day short course will cover publication requirements on high-quality figures, discus principles to produce high-quality figures in SAS, demonstrate using both SAS/GRAPH and ODS Graphics Procedures to produce some commonly used types of figures (line plots, scatter plots, bee swarm plots, box plots, and box plots overlaid with bee swarm plots etc.). The instructor will also demonstrate to produce the above mentioned high-quality figures in listing (EMF, EPS, etc.) and document formats (RTF, PDF etc.).
Outline & Objectives The proposed half day short course will cover the followings. 1) Publication requirements on high-quality figures 2) SAS/GRAPH options and formats for high-quality figures 3) Producing high-quality figures in listing format (EMF, WMF, PS, EPS, etc.) and document format (PDF, RTF, etc.) 4) Demonstrate how to produce some commonly used types of figures, including line plots, jittered scatter plots, line-up jittered scatter plots (AKA bee swarm plot), box plots, scatter plot overlaid with box plots etc. 5) Producing high-quality figures in SAS Enterprise Guide (EG) environment About the Instructor Charlie Liu, PhD is the author of the book "Producing High-quality Figures Using SAS/GRAPH and ODS GRAPHICS Procedures", by CRC Press, Taylor & Francis Group in 2015. (https://www.crcpress.com/Producing-High-Quality-Figures-Using-SASGRAPH-and-ODS-Graphics-Procedures/Liu/9781482207019) Dr. Liu has worked as a SAS programmer and project statistician for more than a decade in several research institutions, and pharmaceutical companies, including US EPA, National Institute of Statistical Sciences (NISS), Washington University Medical School at St. Louis, Eli Lilly and Company, Allergan Inc. and Kythera Biopharmaceuticals. He is now an associate director of biostatistics at Allergan PLC. Dr. Liu is an excellent conference speaker and has presented at various SAS user conferences, including the SAS Global Forum 2013 and JSM 2015. He won an Outstanding Speaker Award at the Mid-West SAS User Group (MWSUG) Conference in 2007, Des Moines, IA. Relevance to Conference Goals The proposed half day short course will help the participants in areas outlined below. 1) Learn principles and techniques to produce high-quality figures in SAS to meet publication requirements. 2) Learn how to produce some commonly used graphs in SAS, including line plots, scatter plots, bee swarm plots, thunderstorm scatter plots, and box plots etc. 3) Have a positive impact on statisticians/programmers to present scientific research data using high-quality graphs Software Packages SAS 9.3 or higher


SC5 Linear Mixed Models Through Health Sciences Applications		Thu, Feb 23, 8:00 AM - 12:00 PM River Terrace 3
Instructor(s): Constantine Daskalakis, Thomas Jefferson University Download Handouts


This course will focus on the heuristic understanding of linear mixed models and their implementation (including assessment of assumptions and model fit, and interpretation of results), rather than formal statistical theory. The following general topics will be covered: a. Specification and interpretation of the fixed effects (population-averaged/mean) model. b. Specification and interpretation of the random effects and their covariance structure (subject-specific effects). c. Considerations regarding the error structure. d. Statistical and graphical methods of assessment of (a), (b), and (c), and model selection strategies. e. Determination, estimation, and testing of linear combinations/contrasts of coefficients to address scientific objectives. f. Writing brief summaries of the results for non-statistical audiences. These topics will be addressed through the analysis of data from two studies: (1) a school-based intervention program designed to impact students’ body mass index (BMI); and (2) an animal xenograft experiment designed to assess the effects of a drug and of radiotherapy on tumor growth.
Outline & Objectives This course is appropriate for an audience that has knowledge of statistics at the level of applied regression. The main requirement is a basic understanding of concepts of confidence intervals, statistical testing, and general regression modeling. The audience may consist of: a. undergraduate or graduate students in statistics and related quantitative fields with biomedical focus; and b. consulting/applied statisticians analyzing multi-level data or longitudinal data with repeated measures in the health sciences. Participants will learn how to apply, evaluate, and interpret linear mixed-effects regression models, through two health sciences applications. Specifically, they will learn how to: a. Perform linear mixed-effects regression modeling in SAS or Stata. b. Specify and perform appropriate comparisons/contrasts. c. Display results in tabular and graphical form. d. Assess model assumptions, evaluate model fit, and compare alternative specifications/models. e. Interpret statistical results (estimates, p-values, etc.) to address scientific objectives. f. Communicate findings to non-statistical audiences. About the Instructor Dr. Daskalakis is Associate Professor of Biostatistics at Thomas Jefferson University and has 15 years of experience as a collaborating statistician in a biomedical environment. He has worked on numerous published studies involving mixed-effects modeling of both hierarchical and longitudinal data. He has taught biostatistics, clinical trials, and regression methods to non-statistical audiences (including a shorter version of the proposed course). Dr. Daskalakis has been very active in ASA’s professional activities through the Section of Teaching of Statistics in the Health Sciences. Relevance to Conference Goals The course is aligned with the conference’s second theme, “Data Modeling and Analysis.” In line with the conference’s goal, the course will allow participants to enhance their programming, analysis, and communication skills. It has been designed as a practical hands-on tutorial on linear mixed models, a modern regression approach to the analysis of correlated hierarchical, clustered, and/or longitudinal data. The course may have special value for consulting/applied statisticians who are a large fraction of CSP’s attendees. Software Packages Participants are strongly encouraged to bring their computers for hands-on practice. The course will use SAS (Proc Mixed) and Stata (–mixed–) code and output. Both of those programs have strong capabilities in fitting linear mixed models, with user-friendly modules. In contrast, R’s packages for fitting linear mixed models (nlme and lme4) have limitations, and their syntax and use (beyond fitting simple models) can be quite complicated. For these reasons, R will not be used in the course. Extensive SAS and Stata code for fitting linear mixed models will be provided to participants. R code will also be provided but will not be discussed.


SC6 Text Analytics and Its Applications		Thu, Feb 23, 1:30 PM - 5:30 PM City Terrace 7
Instructor(s): Edward Jones, Texas A&M University


Text analytics refers to the process of deriving actionable insights from text data. This half-day course explores the evolution and creative application of text analytics to solving business problems. Emphasis is placed on how text analytics is used for solving typical forecasting and classification problems by integrating structured and unstructured text data. Solutions are illustrated using SAS Text Miner, R and Python with real world applications in finance and social media.
Outline & Objectives The course is organized into three main sections designed to cover topics ranging in degree of difficulty from the basic to the advanced: 1. Basics of Text Analytics & Techniques for Acquiring and Pre-Processing Text Data 2. Solving the Primary Text Analytics Problem - Topic Analysis 3. Integrating Structured and Unstructured Text Data Participants with statistical programming experience will gain information on incorporating text analysis into their statistical analyses. About the Instructor Dr. Jones has a PhD degree Statistics from Virginia Tech and a B.S. in Computer Science from Texas A&M University - Commerce. He has over 10 years in the development of statistical and data mining software for companies in Silicon Valley and Rogue Wave Software. He designed and wrote the data mining software incorporated in IMSL, the International Mathematical and Statistics Library.?Currently he teaches data mining and analytics at Texas A&M University. He also consults with companies on business analytics and quality assurance, and is co-founder of Texas A&M Statistical Services. Relevance to Conference Goals Software Packages


SC7 Expressing Yourself with R		Thu, Feb 23, 1:30 PM - 5:30 PM City Terrace 9
Instructor(s): Hadley Wickham, RStudio


In this mini-workshop you'll learn how to better express yourself in R. To express yourself clearly in R you need to know how to write high quality functions and how to use a little functional programming (FP) to solve common programming challenges. You'll learn: * The three key properties of a function. * A proven strategy for writing new functions. * How to use functions to reduce duplication in your code. * How `lapply()` works and why it's so important. * A handful of FP tools that increase the clarity of your code. This workshop is suitable for beginning and intermediate R users. You need to know the basics of R (like importing your data and executing basic instructions). If you're an advanced R user, you probably won't learning anything completely new, but you will learn techniques that allow you to solve new challenges with greater ease. The workshop will be hands-on and interactive, so please make sure to bring along your laptop with R installed!
Outline & Objectives In this mini-workshop you'll learn how to better express yourself in R. To express yourself clearly in R you need to know how to write high quality functions and how to use a little functional programming (FP) to solve common programming challenges. About the Instructor Hadley is Chief Scientist at RStudio and a member of the R Foundation. He builds tools (both computational and cognitive) that make data science easier, faster, and more fun. His work includes packages for data science (the tidyverse: ggplot2, dplyr, tidyr, purrr, readr, ...), and principled software development (roxygen2, testthat, devtools). He is also a writer, educator, and frequent speaker promoting the use of R for data science. Learn more on his website, http://hadley.nz. Relevance to Conference Goals Modern data analysis must be performed on a computer, and if you're doing data analysis on a computer, it's worth the investment to learn a programming language. In this tutorial, you'll learn some useful tools in R that improve your ability to automate repeated parts of your analyses. Software Packages R + purrr package


SC8 Missing Data Analysis with R/SAS/Stata		Thu, Feb 23, 1:30 PM - 5:30 PM City Terrace 12
Instructor(s): Din Chen, The University of North Carolina at Chapel Hill; Frank Liu, Merck Research Labs Download Handouts


Missing data are near universal in applied research. Almost all applied researchers have faced the problems of missing data at some point. However, not all the researchers assessed missingness or used appropriate ways to deal with the missing data. Instead, researchers often drop the missing values (e.g., listwise deletion), which reduces the sample size, lowers statistical power, or use ad-hoc single imputation such as LOCF for simplicity. Both approaches introduce the possibility of biased parameter estimations. Such inefficient and potentially biased statistical inference would lead to erroneous research conclusions. This short course aims to address the problems of missing data. The concept of different missing data mechanisms or typologies including missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) will be discussed with illustration from real clinical trial examples. Moreover, this short course will introduce how to conduct two commonly-used model-based methods for missing data analysis including the multiple imputation (Little & Rubin, 2002; Reiter & Raghunathan, 2007) and maximum likelihood (Allison, 2012) using R/SAS. Some sensitivity analysis approaches to handle missing data under MNAR will also be discussed briefly.
Outline & Objectives This short course will: a) review missing data issues and different missing data mechanisms (i.e., MCAR, MAR and NMAR); b) introduce the multiple imputation and maximum likelihood methods in regression modeling and discuss the advantages and disadvantages of each method; c) illustrate using R/SAS to analyze real data from clinical trial studies and compare the consistence from different software. About the Instructor Dr. Din Chen is a Fellow of ASA. He is now the Wallace H. Kuralt distinguished professor, Director of Consortium for Statistical Development and Consultation, School of Social Work and professor in biostatistics at the Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill. He was a professor in biostatistics at the University of Rochester and the Karl E. Peace endowed eminent scholar chair in biostatistics at Georgia Southern University. Professor Chen is also a senior statistics consultant for biopharmaceuticals and government agencies with extensive expertise in clinical trials and bioinformatics. He has more than 100 referred professional publications and co-authored 10 books in clinical trial methodology and public health applications. Professor Chen was honored with the "Award of Recognition" in 2014 by the Deming Conference Committee for highly successful biostatistics workshop tutorials. His “Applied Meta-analysis” short course in 2016 at CSP was well received by his attendees. Dr. Frank Liu is Distinguished Scientist, Biostatistics at Merck Research Laboratories. He has over 21 years of pharmaceutical industry working experiences on supporting multiple therapeutic areas including neuroscience, psychiatry, infectious disease, and vaccine products. His research of interests includes methods for longitudinal trials, missing data issues, safety data analysis, and design and analysis of non-inferiority trials. He has published more than 30 statistical papers/book chapters, and presented talks at professional meetings regularly. He has co-led a subteam for Bayesian missing data analysis in DIA Bayesian Working Groups, and co-taught short courses on “Sensitivity analysis using Bayesian and Imputation Approaches” at Deming conference 2015, and on “SAS Biopharmaceutical Applications” at Regulatory-Industry Workshop 2016. He has been leading working groups within Merck on developing several guidance documents on analysis of missing data. Relevance to Conference Goals 1. To give an up-to-date development in missing data analysis so to guide the participants to learn the techniques about missing data imputation and maximum likelihood methods 2. To give an overview of R/SAS implementations for missing data analysis 3. To emphasize the applied aspects of how to deal with missing data with real examples and help attendees to solve their real-life problems from research and consulting as an applied statisticians and analysts Software Packages R, SAS


SC9 Bootstrap Methods and Permutation Tests		Thu, Feb 23, 1:30 PM - 5:30 PM River Terrace 3
Instructor(s): Tim Hesterberg, Google Download Handouts


We begin with a graphical approach to bootstrapping and permutation testing, illuminating basic statistical concepts of standard errors, confidence intervals, p-values and significance tests. We consider a variety of statistics (mean, trimmed mean, regression, etc.), and a number of sampling situations (one-sample, two-sample, stratified, finite-population), stressing the common techniques that apply in these situations. We'll look at applications from a variety of fields, including telecommunications, finance, and biopharm. These methods let us do confidence intervals and hypothesis tests when formulas are not available. This lets us do better statistics, e.g. use robust methods (we can use a median or trimmed mean instead of a mean, for example). They can help clients understand statistical variability. And some of the methods are more accurate than standard methods.
Outline & Objectives Introduction to Bootstrapping General procedure Why does bootstrapping work? Sampling distribution and bootstrap distribution Bootstrap Distributions and Standard Errors Distribution of the sample mean Bootstrap distributions of other statistics Simple confidence intervals Two-sample applications How Accurate Is a Bootstrap Distribution? Bootstrap Confidence Intervals Bootstrap percentiles as a check for standard intervals More accurate bootstrap confidence intervals Significance Testing Using Permutation Tests Two-sample applications Other settings Wider variety of statistics Variety of applications Examples where things go wrong, and what to look for Wider variety of sampling methods Stratified sampling, hierarchical sampling Finite population Regression Time series Participants will learn how to use resampling methods: * to compute standard errors, * to check the accuracy of the usual Gaussian-based methods, * to compute both quick and more accurate confidence intervals, * for a variety of statistics and * for a variety of sampling methods, and * to perform significance tests in some settings. About the Instructor Dr. Tim Hesterberg is a Senior Statistician at Google. He previously worked at Insightful (S-PLUS), Franklin & Marshall College, and Pacific Gas & Electric Co. He received his Ph.D. in Statistics from Stanford University, under Brad Efron. Hesterberg is author of the "Resample" package for R and primary author of the "S+Resample" package for bootstrapping, permutation tests, jackknife, and other resampling procedures, is co-author of Chihara and Hesterberg "Mathematical Statistics with Resampling and R" (2011), and is lead author of "Bootstrap Methods and Permutation Tests" (2010), W. H. Freeman, ISBN 0-7167-5726-5, and technical articles on resampling. See http://www.timhesterberg.net/bootstrap. Hesterberg is on the executive boards of the National Institute of Statistical Sciences and the Interface Foundation of North America (Interface between Computing Science and Statistics). He teaches kids to make water bottle rockets, leads groups of high school students to set up computer labs abroad, and actively fights climate chaos. Relevance to Conference Goals Resampling methods are important in statistical practice, but have been omitted or poorly covered in may old-style statistics courses. These methods are an important part of the toolbox of any practicing statistician. It is important when using these methods to have some understanding of the ideas behind these methods, to understand when they should or should not be used. They are not a panacea. People tend to think of bootstrapping in small samples, when they don't trust the central limit theorem. However, the common combinations of nonparametric bootstrap and percentile intervals is actually accurate than t procedures. We discuss why, remedies, and better procedures that are only slightly more complicated. These tools also show how poor common rules of thumb are -- in particular, n >= 30 is woefully inadequate for judging whether t procedures should be OK. Software Packages I mention the R resample and boot packages, but this is not a focus of the course.


PS1 Poster Session 1 and Opening Mixer		Thu, Feb 23, 5:30 PM - 7:00 PM Conference Center AB


Chair(s): Nancy Wang, Celerion

	1 Web Scraping Government Tax Revenue with Machine Learning View Presentation Brian Arthur Dumbacher, U.S. Census Bureau
	2 A Hierarchical Clustering Analysis (HCA) in Automatic Driving Regarding Vehicle-to-Vehicle Pedestrian Position Identification View Presentation Jie Xue, Purdue University
	3 Communicating Statistics to Nonstatisticians View Presentation Kim Love, K. R. Love Quantitative Consulting and Collaboration
	4 Good Statistical Practices: An Example of Meta-Analysis of Odds Ratios View Presentation Bei-Hung Chang, University of Massachusetts Medical School
	5 A New Method to Assess Measurement Agreement in Machine Readings Dong-Yun Kim, NHLBI/NIH
	6 Smoking Tendencies Among Junior High--School Students in Ghana: Applications of ROC Curve and AUC View Presentation Emmanuel Thompson, Southeast Missouri State University
	8 An Application of Competing Risk Analysis in Large Cardiovascular Clinical Trials Purva Jain, Beth Israel Deaconess Medical Center, Harvard Medical School
	9 Profile Monitoring for Poisson Data with Fixed Effects Using Nonparametric Methods Sepehr Piri, Virginia Commonwealth University
	10 Interaction of Measurement Burden and Disease in the ENRICHD Clinical Trial Su-Yun Han, NHLBI/NIH
	12 Probabilistic Record Linkage in R and Stata View Presentation Anders R Alexandersson, Florida Cancer Data System
	13 Creating Reproducible Tables in R Markdown View Presentation Claire Palmer, University of Colorado at Denver School of Medicine
	14 More Than Meets the Eye: Bayesian Inference in Nonparanormal Graphical Models Jami Jackson Mulgrave, North Carolina State University
	15 Communicating with Clinicians About Models That Predict Risk Using Interactive Web Graphics View Presentation Marshall Brown, Fred Hutchinson Cancer Research Center
	16 R Resample Package View Presentation Tim Hesterberg, Google
	17 StatTag: A Reproducible Research Tool for Generating Dynamic Documents Using Microsoft Word View Presentation Abigail S Baldridge, Northwestern University


Exhibits Open		Thu, Feb 23, 5:30 PM - 7:00 PM Conference Center AB






Friday, February 24
Registration		Fri, Feb 24, 7:30 AM - 5:30 PM






Continental Breakfast		Fri, Feb 24, 7:30 AM - 8:30 AM Conference Center AB






Exhibits Open		Fri, Feb 24, 7:30 AM - 6:30 PM Conference Center AB






GS1 Keynote Address		Fri, Feb 24, 8:00 AM - 9:00 AM River Terrace 1


Chair(s): MoonJung Cho, Bureau of Labor Statistics

	Snakes and Ladders: Challenges in Forging a Career in Statistics View Presentation David Lane Banks, Duke University


CS01 Presentation and Storytelling		Fri, Feb 24, 9:15 AM - 10:45 AM River Terrace 2


Chair(s): Cynthia R. Long, Palmer College of Chiropractic

9:20 AM	The Statistician’s Role in Data Storytelling Projects: Case Studies and Best Practices Haviland Wright, Boston University
10:05 AM	Statistical Presentation Power: How to Reveal Your 'X Factor'!!! Jennifer H Van Mullekom, Laboratory for Interdisciplinary Statistical Analysis (LISA), Virginia Tech


CS02 Beyond the Basics: Advanced Modeling Methods		Fri, Feb 24, 9:15 AM - 10:45 AM River Terrace 3


Chair(s): Shankang Qu, PepsiCo

9:20 AM	Don't Be Silly; Do It Bayesian Perceval Sondag, Arlenda
10:05 AM	Improve Regression and Communicate Results Using Stochastic Gradient Boosting and LASSO View Presentation Charles William Harrison, Salford Systems


CS03 Data Wrangling and Visualization		Fri, Feb 24, 9:15 AM - 10:45 AM City Terrace 7


Chair(s): Robert P. Yerex, University of Virginia Medical Center

9:20 AM	Discover and Visualize the Golden Paths, Unique Sequences, and Marvelous Associations Out of Your Big Data Using Link Analysis in SAS Enterprise Miner View Presentation Delali Agbenyegah, Alliance Data Card Services
10:05 AM	Data Scraping, Parsing, Wrangling, and Cleaning View Presentation Mark Daniel Ward, Purdue University


CS04 Keep It Simple with R		Fri, Feb 24, 9:15 AM - 10:45 AM City Terrace 9




9:20 AM	Reproducibility in Action View Presentation Richard Thomas Schwinn, U.S. Small Business Administration
10:05 AM	Managing Many Models Hadley Wickham, RStudio


CS05 Statistical Collaboration		Fri, Feb 24, 11:00 AM - 12:30 PM River Terrace 2


Chair(s): Michael Latta, YTMBA Research & Consulting and Coastal Carolina University

11:05 AM	Practical Examples and Challenges of Statistical Consulting in Health Settings Laura H Gunn, Stetson University
11:50 AM	Panel Discussion on Statistical Volunteers David J Corliss, Peace-Work


CS06 Business Intelligence Practices		Fri, Feb 24, 11:00 AM - 12:30 PM River Terrace 3


Chair(s): Madhuri Mulekar, University of South Alabama

11:05 AM	On the Street: Conducting Business Research View Presentation Joyce Nilsson Orsini, Fordham University GBA
11:50 AM	Bridging the Gap on Multi-Channel Attribution View Presentation John Lin, Epsilon Data Management


CS07 Surveys and Sentiment Analysis		Fri, Feb 24, 11:00 AM - 12:30 PM City Terrace 7


Chair(s): Susan Simmons, NC State Institute for Advanced Analytics

11:05 AM	The Nexus Between Data Science, Survey Design, and Statistical Practice View Presentation Steven B Cohen, RTI International
11:50 AM	Sentiment Analysis of Brand Social Mentions: The Polarity Classification and Beyond View Presentation Jin Su, Johnson & Johnson Vision Care, Inc.


CS08 Interactivity with R Shiny		Fri, Feb 24, 11:00 AM - 12:30 PM City Terrace 9


Chair(s): Edward Mulrow, NORC at the University of Chicago

11:05 AM	Working with Shiny Things View Presentation Harlen Hays, Cerner Corporation
11:50 AM	Rapid Data Visualization and Dissemination Using R and Shiny View Presentation Bogdan Alexandru Rau, UCLA Center for Health Policy Research


Lunch (on own)		Fri, Feb 24, 12:30 PM - 2:00 PM






CS09 Organizational Impact		Fri, Feb 24, 2:00 PM - 3:30 PM River Terrace 2


Chair(s): Kathy Hanford, University of Nebraska-Lincoln

2:05 PM	Developing a Data Science Center of Excellence (DS CoE) View Presentation Celeste R Fralick, Unaffiliated
2:50 PM	My Marathon Journey for Analytics Change View Presentation Terri Henderson, Johnson & Johnson Vision Care, Inc.


CS10 Probability Distributions		Fri, Feb 24, 2:00 PM - 3:30 PM River Terrace 3


Chair(s): Kathleen Jablonski, The George Washington University

2:05 PM	Modeling Proportions and Probabilities: The Beta Distribution Is Your Friend View Presentation Paul Teetor, William Blair & Co.
2:50 PM	Probability Density for Repeated Events View Presentation Bruce Stephen Lund, Magnify Analytic Solutions


CS11 Text Analytics		Fri, Feb 24, 2:00 PM - 3:30 PM City Terrace 7


Chair(s): Laura H Gunn, Stetson University

2:05 PM	Predicting Regulatory Risk from Unstructured Text Data View Presentation Danielle Leigh Boree, Johnson & Johnson Vision Care, Inc.
2:50 PM	Using Text Analytics and Signal Detection to Predict Medical Device Recalls View Presentation Lisa Ensign, Significant Statistics


CS12 Going Big on Bayesian		Fri, Feb 24, 2:00 PM - 3:30 PM City Terrace 9


Chair(s): Alok Kumar Dwivedi, Texas Tech University Health Sciences Center

2:05 PM	Introduction to Bayesian Analysis Using Stata View Presentation Chuck Huber, StataCorp
2:50 PM	Bayesian Structural Equation Modeling View Presentation M'hamed Hamy Temkit, Mayo Clinic


CS13 Career and Personal Development		Fri, Feb 24, 3:45 PM - 5:15 PM River Terrace 2


Chair(s): Rich Newman, Johnson & Johnson Vision Care, Inc.

3:50 PM	Soft Skills for Succeeding Outside of Academia View Presentation Diahanna L Post, Nielsen
4:35 PM	Career Development for Statisticians in a Collaborative Environment: Importance of Effective Mentoring and Development of Soft Skills Jay N Mandrekar, Mayo Clinic


CS14 Addressing Statistical Problems and Issues		Fri, Feb 24, 3:45 PM - 5:15 PM River Terrace 3


Chair(s): Bonita Singal, United States Department of Energy

3:50 PM	Data Preparation: The Key for Meaningful Insights Huiyu Qian, AutoAnything Inc.
4:35 PM	Matched Case-Control Data Analysis Yinghui Duan, Connecticut Institute for Clinical and Translational Science


CS15 Machine Learning		Fri, Feb 24, 3:45 PM - 5:15 PM City Terrace 7


Chair(s): John Stevens, Utah State University

3:50 PM	Tree-Based Techniques for High-Dimensional Data View Presentation Wei-Yin Loh, University of Wisconsin
4:35 PM	Intro to Deep Learning with TensorFlow View Presentation Denisa A.O. Roberts, ASAPP Inc.


CS16 Generalized Linear Mixed Models with R		Fri, Feb 24, 3:45 PM - 5:15 PM City Terrace 9


Chair(s): Doug Lehmann, University of Miami

3:50 PM	Constructing and Analyzing Generalized Linear Mixed Models View Presentation Christina P Knudson, Macalester College
4:35 PM	Simulation and Power Analysis of Generalized Linear Mixed Models Brandon LeBeau, University of Iowa


PS2 Poster Session 2 and Refreshments		Fri, Feb 24, 5:15 PM - 6:30 PM Conference Center AB


Chair(s): Huanjun Zhang, Texas A&M University

	1 Use of Longitudinal Models to Identify Subject-Specific Implausible Body Mass Index Measures: A Comparison with Screening for Population-Level Outliers View Presentation Carrie Tillotson, OCHIN, Inc.
	2 Enhancing Monthly Retail Holiday Effect Methodology Through Daily Data View Presentation Rebecca Jean Hutchinson, U.S. Census Bureau
	3 Statistics for Public Policy: Reflecting on Change in the Last 50 Years View Presentation Karen Moran Jackson, The University of Texas at Austin
	4 Data Science, Statistics, Analytics, Data Engineering: What Does It All Mean? View Presentation Michael Latta, Coastal Carolina University
	5 Iterative Semiparametric Generalized Linear Models Busayasachee Puang-Ngern, Macquarie University
	6 Expanding the Appeal of Model Selection Using Mixture Priors to Incorporate Expert Opinion: A Behavioral Economic Case Study Christopher T Franck, Virginia Tech
	7 Performance of Data Mining Methods in an Example with Ordinal and Imbalanced Data View Presentation Elena Rantou, FDA
	8 Additive P-Value Combinations and an Application in Consumer Product Research View Presentation Georgette Asherman, Direct Effects, LLC
	9 Pseudo-Maximum Likelihood Estimation with Sampling Weight for Modeling Count Data from a Complex Survey View Presentation Lin Dai, Medical University of South Carolina
	11 Two-Step Logistic Regression Model for Predicting Phone Campaign Response View Presentation Sharon (Renting) Xu, AARP, Inc.
	12 Detecting Interaction in Two-Way Unreplicated Experiments via Bayesian Model Selection View Presentation Thomas Anthony Metzger, Virginia Tech
	13 Analyzing Shot Data with MANOVA View Presentation Victoria Cox, Dstl
	14 What Effort Is Needed in Explaining Statistical Results to Pediatric Researchers? A Survey to Better Understand the Confidence and Knowledge of Pediatric Researchers View Presentation Curtis Dean Travers, Emory University School of Medicine
	15 Rapid Data Visualization and Dissemination Using R and Shiny View Presentation Bogdan Alexandru Rau, UCLA Center for Health Policy Research
	16 ShinySurvival: An Interactive Tool for Visualizing and Analyzing Survival Data in R View Presentation Felicia Powell Hardnett, CDC
	17 Side-by-Side Bar Charts for More Than One Variable on Different Scales Using SAS SGPLOT View Presentation John Stephen Taylor, Johnson & Johnson Vision Care, Inc.
	18 Good Old Excel: Using an Old Favorite to Explore, Visualize, and Share Data View Presentation Nola du Toit, NORC at the University of Chicago


Saturday, February 25
Registration		Sat, Feb 25, 7:30 AM - 2:30 PM






Exhibits Open		Sat, Feb 25, 7:30 AM - 1:00 PM Conference Center AB






PS3 Poster Session 3 and Continental Breakfast		Sat, Feb 25, 8:00 AM - 9:15 AM Conference Center AB


Chair(s): Michael Devin Floyd, Saint Software

	1 Variations in Statistical Practice Between North-American Stat Labs View Presentation Eric Vance, University of Colorado at Boulder
	2 Not Just a Statistician: Experience on How to Communicate with Your Client View Presentation Kate Wan-Chu Chang, University of Michigan
	3 Listwise Deletion or Multiple Imputation When Complex Sample Data Are MCAR or MAR: A Guide to Selecting an Appropriate Missing Data Treatment Method View Presentation Anh P. Kellermann, University of South Florida
	4 Analysis of Bird Arrival Dates in Cayuga County View Presentation Caitlin Mary Cunningham, Le Moyne College
	5 A Guide to Modeling Strategies for Immunological Count Data View Presentation Claire Palmer, University of Colorado at Denver School of Medicine
	6 Blending Big Data Visualization Tools with Statistical Analysis: Improving Automotive Lubricants View Presentation Jim McAllister, Afton Chemical Corporation
	7 Partial Least Squares Regression Analysis Identifies Interleukin-1 Receptor as a Predictor of Airway Neutrophils in Asthma View Presentation Michael David Evans, University of Wisconsin-Madison
	8 Data Fusion Techniques for Estimating the Relative Abundance of Rare Species View Presentation Purna Gamage, Texas Tech University
	9 Statistical Comparison of Particle Size Distributions View Presentation Scott J Richter, The University of North Carolina at Greensboro
	10 Optimal Experimental Designs for Mixed Categorical and Continuous Responses View Presentation Soohyun Kim, Arizona State University
	11 Balanced Salary Structure Modeling View Presentation Thor Dane Osborn, Sandia National Laboratories
	12 Sequential Pattern Mining in Real-Time Marketing with Backward Match Algorithm View Presentation Yi Cao, Alliance Data Card Services
	14 Using Shiny to Efficiently Process Survey Data View Presentation Carl Ganz, UCLA Center for Health Policy Research
	15 Integrating R Programming Platforms into Community Collective Impact Efforts to Solve Social Problems View Presentation Frank M Ridzi, Central New York Community Foundation and Le Moyne College
	16 Generating Tables and Statistical Summary Using PROC REPORT View Presentation Lei Zhang, University of Minnesota
	17 A Nonlinear Regression Plugin for Rcmdr View Presentation Thomas Edward Burk, University of Minnesota
	18 Predict Warriors' 73-Win on April 13, 2016 View Presentation Jason Li, Morrill Learning Center


CS17 Ethical Guidelines		Sat, Feb 25, 9:15 AM - 10:45 AM River Terrace 2


Chair(s): Constantine Daskalakis, Thomas Jefferson University

9:20 AM	How to Deal with Ethical Issues of Human Subjects, Difficult Colleagues, and Networking View Presentation Michael Latta, YTMBA Research & Consulting and Coastal Carolina University
10:05 AM	How the New ASA Guidelines Help Practicing Statisticians View Presentation Alan C. Elliott, Southern Methodist University


CS18 Going Mainstream: Emerging Modeling Methods		Sat, Feb 25, 9:15 AM - 10:45 AM River Terrace 3


Chair(s): Viswanathan Ramakrishnan, Medical University of South Carolina

9:20 AM	Amazon Product Co-Purchasing Network Estimation Through ERGM Model Using Reference Prior View Presentation Sayan Chakraborty, Michigan State University
10:05 AM	Integrating Text Analytics with Traditional Structured Analytics Edward Jones, Texas A&M University


CS19 Guided and Automatic Model Selection		Sat, Feb 25, 9:15 AM - 10:45 AM City Terrace 7


Chair(s): Inyoung Kim, Virginia Tech

9:20 AM	Best Practices in Model Selection and Profiling View Presentation Scott Lee Wise, SAS Institute, Inc.
10:05 AM	Designing Automated Workflows for Model Selection and Optimization View Presentation Christian Kendall, Salford Systems


CS20 A Graph Is Worth a Thousand Words		Sat, Feb 25, 9:15 AM - 10:45 AM City Terrace 9


Chair(s): Andrew D. Althouse, University of Pittsburgh Medical Center

9:20 AM	Geospatial Analysis with R View Presentation Michael Jadoo, Unaffiliated
10:05 AM	How to Avoid Some Common Graphical Mistakes View Presentation Naomi B. Robbins, NBR


CS21 Communicating to Motivate and Influence		Sat, Feb 25, 11:00 AM - 12:30 PM River Terrace 2


Chair(s): Yuanyuan Tang, Saint Luke's Health System

11:05 AM	The Psychology of Influence View Presentation Colleen Mangeot, Cincinnati Children's Hospital Medical Center
11:50 AM	Strategic Marketing and Communication for Statistical Consultants and Collaborators View Presentation Renita Canady, Association of American Medical Colleges


CS22 In THIS Corner: X1! When Model Variables Compete		Sat, Feb 25, 11:00 AM - 12:30 PM River Terrace 3


Chair(s): Michael Reiger, West Virginia University

11:05 AM	Estimating with Weights: Common Sense vs. Unbiasedness View Presentation Tim Hesterberg, Google
11:50 AM	Competing Risk Data and Semi-Competing Risk Data Analysis and Visualization in SAS and R View Presentation Ran Liao, Indiana University


CS23 Latent Variable and Mixed Effects Models		Sat, Feb 25, 11:00 AM - 12:30 PM City Terrace 7


Chair(s): Xianggui (Harvey) Qu, Oakland University

11:05 AM	Nonparametric Mixed-Effects Regression for Large Samples View Presentation Nathaniel Erik Helwig, University of Minnesota
11:50 AM	Optimization of Processes and Products from Historical (Un-Designed) Data View Presentation John F. MacGregor, ProSensus, Inc.


CS24 It's a Package Deal		Sat, Feb 25, 11:00 AM - 12:30 PM City Terrace 9


Chair(s): John Castelloe, SAS Institute

11:05 AM	Introduction to JMP Software Terrie Vasilopoulos, University of Florida
11:50 AM	Logistic Regression Cross-Package Comparison View Presentation Lillian Ma, Capital One Bank


Lunch (on own)		Sat, Feb 25, 12:30 PM - 2:00 PM






PCD1 Power and Sample Size Analysis Using Stata		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 6
Instructor(s): Chuck Huber, StataCorp


Power and sample size analysis is a fundamental step in the planning of any research project. This talk will demonstrate how to use Stata's power command to calculate power, sample size and minimum detectable effect size. We will show how to create customized tables and graphs for many study designs with both continuous and categorical outcomes. We will also demonstrate how to add your own methods to the power command and how to calculate power for multilevel/longitudinal studies using simulation.


PCD2 Xymp: A Web Application Supporting Best Practices in Bioassay		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 8
Instructor(s): David Lansky, Precision Bioassay, Inc.


The software system consists of three components (each on a different virtual server): a web application (written in PHP), a database, and a collection of R programs, packages, and reports (sweave and knitr). The system helps users perform randomized instances of routine bioassays, performs mixed model analyses (using linear or non-linear models), produces reports (including summaries). The statistical portion works well with simple or complex designs (from CRD to a strip-unit). The system contains a lot of features to meet regulatory requirements (users with different levels of authorization, automatic tracking and reporting of re-analyses of data, etc.). The system is designed to be very easy for routine use in the lab, while providing a rich collection of modern statistical capabilities. The system is designed to facilitate good collaboration between bioassay scientists and statisticians. Each assay has a protocol, each analysis has a protocol, the protocols capture all the statistical details; the lab users select protocols by name. The statisticians build the protocols.


PCD3 Marketing Mix Modeling and Optimization Using Bayesian Networks and BayesiaLab		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 12
Instructor(s): Stefan Conrady, Bayesia USA


“Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” Over the last century, various versions of this quote have been attributed to John Wanamaker, Henry Ford, and Henry Procter, among others. Yet, 100 years after these marketing pioneers, in this day and age of big data and advanced analytics, the quote still rings true among marketing executives. The ideal composition of advertising and marketing efforts remains the industry's Holy Grail. The current practice remains “more art than science.” The lack of a well-established marketing mix methodology has little to do with the domain itself. Rather, it reflects the fact that marketing is yet another domain that typically has to rely on non-experimental data for decision support. The single most important thing we need to recognize about marketing mix modeling is that it is a causal question. This means, we are not looking for a prediction of an outcome variable based on the observation of marketing variables. Rather, we are looking to manipulate marketing variables to optimize an outcome variable. Thus, we are performing an intervention, which requires us to perform causal inference. This leads us to the Holy Grail of statistics, i.e. causal inference from observational data. In this workshop, we introduce the basic concepts of graphical models and how they can help us perform causal identification, e.g. using causal assumptions and the well-known Adjustment Criterion. While this is straightforward in theory, the complexity of the marketing domain prevents the practical application of this criterion. Thus, we introduce a new criterion (Shpitser and VanderWeele, 2011) that reduces the number of assumptions that we require for confounder selection and causal identification. Implementation with BayesiaLab With the confounders identified, we can now build a high-dimensional statistical model that represents the joint probability distribution of all marketing variables. We do that using the machine-learning algorithms of the BayesiaLab software platform. We obtain a Bayesian network that represents a multitude of relationships between all marketing variables and the outcome variable. Using BayesiaLab’s visualization functions, we can compare the machine-learned graph to our understanding of the domain. Furthermore, we can examine the (mostly nonlinear) response curves of the outcome variable as a function of the marketing variables. Most importantly, we use BayesiaLab to perform Likelihood Matching on all confounders to establish the causal response of the outcome variable. With all causal response curves computed, we introduce cost functions for the marketing variables via BayesiaLab’s Function Node. On that basis, we proceed to BayesiaLab’s Target Optimization function, which, by means of a genetic algorithm, searches for an optimal combination of all marketing variables, while being subject to constraints of individual variables and an overall marketing budget constraint. The optimization report shows feasible solutions along with the degree of achievement.


PCD4 Dig Deeper and Uncover the Unexpected with JMP 13 and JMP Pro 13		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 10
Instructor(s): Mia Stephens, SAS Institute, Inc.; Scott Lee Wise, SAS Institute, Inc.


This session will cover how we can meet the challenge to explore, model and experiment on complex data analytic needs by: • Increasing Ease and Efficiency of Preparing and Accessing Data • Handling and Exploring all Types of Data, including Text • Providing Next Generation Analytical Tools in Quality, DOE & Reliability • Unleashing Advanced Analytics in Predictive Modeling • Improving the Ways to Share and Report Out Analytics and Graphs We will feature new ground-breaking methodology on relevant demos to maximize participant learning.


T1 Understanding and Working with Different (and Sometimes Difficult) People		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 7
Instructor(s): Colleen Mangeot, Cincinnati Children's Hospital Medical Center Download Handouts


Do you have coworkers, researchers, or clients that are difficult to work with? Do you feel frustrated and/or confused about how to work with them? Do you wonder sometimes why they just don’t get it? This session will introduce the DISC model for understanding and working with different and sometimes difficult people. It will involve case studies and examples. The result? Improved relationships, increased effectiveness, greater influence, and ability to motivate others.
Outline & Objectives 1. Determine the different communication styles 2. Identify your style and those of others 3. Develop strategies for working with each of the styles About the Instructor Colleen Mangeot's diverse career includes 10 years in the actuarial field, 10 years in coaching and leadership development, and 7 years in biostatistics. Highlights of her coaching business include: Successfully working with clients to increase efficiency and sales by 30% or more; Attained the Professional Coach Certification from the International Coach Federation in 2003; Monthly columnist for the Dayton Business Journal; National speaker with over 200 hours of paid speaking engagements; Contractor with the Anthony Robbins Companies. She received her MS in Statistics from Miami University in 2008. She received the NSA National Research Council Fellowship at NIOSH, and worked in statistical quality improvement at the VA. Now, in addition to working in the Biostatistical Consulting Unit at Cincinnati Children’s Hospital Medical Center, she is also an internal coach working with executives to further their careers. She was a panelist for the invited session at JSM 2013, Secrets to Effective Communication for Statistical Consultants. She also had two very well received presentations at the CSP 2015 and conducted a successful short course and tutorial at CSP 2016. Relevance to Conference Goals This session will develop important communication skills for career advancement, leadership and management effectiveness, and successful selling for consultants. We all have someone that is difficult to work with. The most successful people are able to work with a variety of people and appreciate and leverage their contributions. Software Packages None.


T2 Penalized Regression Methods for Generalized Linear Models in SAS/STAT		Sat, Feb 25, 2:00 PM - 4:00 PM City Terrace 9
Instructor(s): G Gordon Brown, SAS Institute, Inc.


Regression problems that have large numbers of candidate predictor variables occur in a wide variety of scientific fields and in business. These problems require you to perform statistical model selection to find an optimum model that is simple and has good predictive performance. For linear and generalized linear models you will see how to use the forward, backward, stepwise, and LASSO methods of variable selection. This tutorial presents modern variable selection methods for linear models using the adaptive LASSO, group LASSO, and elastic net penalized regression techniques, plus various screening methods. Penalized regression techniques yield a sequence of models and require at least one tuning method to choose the optimum model that has the minimum estimated prediction error. You will learn how to use fit criteria (such as AIC, SBC, and the Cp statistic), average square error on the validation data, and cross validation as tuning methods for penalized regression. Various examples will be provided using the GLMSELECT and HPGENSELECT procedures of SAS/STAT, which offer extensive customization options and powerful graphs for performing statistical model selection.
Outline & Objectives Outline: 1. Introduction a. Goals of model selection 2. Model Selection methods a. PROC GLMSELECT b. PROC HPGENSELECT c. Traditional Selection Methods d. Modern Selection Methods 3. Penalized Regression Methods a. LASSO b. Adaptive LASSO c. Elastic Net d. Group LASSO e. Validation 4. Model Averaging 5. Screening 6. Summary Objectives 1. Introduce variable selection methods and penalized regression methods 2. Illustrated practical applications of using variable selection methods for both linear and generalized linear models 3. Provide guidelines for choosing the ‘best’ model fitting tools for a given problem 4. Demystify the methodology. About the Instructor Dr. G. Gordon Brown is a Senior Research Statistician in the Statistical Applications R&D department at the SAS Institute. Before joining SAS in 2015 Dr. Brown performed contract research for 14 years specializing in survey data analysis, regression modeling, and environmental statistics. Since joining SAS he has given several presentations and tutorials at various conferences and SAS users' group meetings. He has a Ph.D. in Statistics from North Carolina State University and has been a SAS user since 1989. Relevance to Conference Goals The term ‘Big Data’ typically conjures up images of data sets with a large number of observations. However, it is becoming increasingly common for data sets to have a large number of variables as well. Selecting a set of variables that accurately predict an outcome of interest without overfitting the model is difficult in this situation. The modern penalized regression methods presented in this tutorial provide the data analyst with the tools they need to build parsimonious regression models. Software Packages SAS


T3 Introduction to Spatial Analysis Through Statistics		Sat, Feb 25, 2:00 PM - 4:00 PM River Terrace 3
Instructor(s): Michael Devin Floyd, Saint Software; Phillip Stedman Floyd, Segal Consulting Download Handouts


The word spatial means related to space or geography. Thus, spatial analysis is an analysis that takes into consideration the location of the observation. This course is about using spatial elements to derive conclusions or eliminate dependence based on location. This course assumes no prior knowledge of spatial analysis. It starts from the beginning by defining spatial data and reasoning why spatial analysis is relevant. Different mapping techniques are explored to visualize spatial information to get a better understanding. Test for spatial dependence in datasets are discussed. Then, it is shown that spatial dependence can influence results. Spatial regression techniques are discussed to mitigate the spatial dependence. For the conclusion, I talk about how accounting for the spatial dependence influenced the research I did at the Louisiana Public Health Institute. Basic knowledge of regression and linear modeling is assumed.
Outline & Objectives Introduction: Define spatial data and spatial objects to start the discussion of spatial analysis. Talk about mapping techniques that can be used to visualize the spatial data. Give examples in different software packages. Spatial dependence: Give an example of spatial dependent data. Talk about the necessity to deal with the spatial dependence/relevance. Define the spatial weight matrix. Talk about the different ways of creating this matrix. Talk about spatial autocorrelation, global indexes, and local indexes. Describe how they are defined/formed. Comment on the differences. Spatial regression: Spatial weight matrix can be added to any linear model to account for spatial autocorrelation. Two main forms are spatial lag and spatial error models. Describe the difference. Redo previous example with added spatial weight matrix. Give code for different programs. Research example: Talk about research done at the Louisiana Public Health Institute (LPHI). Goal of the research was to determine if the number of tobacco stores present in an area of New Orleans was related to different economic and demographic conditions. Describe modeling process. Talk about results. About the Instructor Phillip Floyd: B.S. Pure Mathematics - Louisiana State University M.S. Statistics - The University of New Orleans 3 years of actuarial and statistical consulting experience GStat Accredited Poster presentation at CSP 2016 Michael Floyd: B.S. Pure Mathematics - The University of Louisiana at Lafayette M.S. Biostatistics - Washington University in St. Louis 2 years of statistical research experience Poster presentation at CSP 2016 Phillip Floyd was a statistical consultant for the Louisiana Public Health Institute where he used spatial techniques to analyze the data of a tobacco study in the city of New Orleans. His methods would be used continuously in the future so longitudinal results can be formed and causation can be concluded. A paper was pending publication when his statistical consulting work ended. Relevance to Conference Goals Spatial analysis can be used across many industries. When any analysis is being done and a variable is derived based on location, spatial effects should at least be tested for. If found to be significant, spatial techniques can be easily added to any linear model. There are a lot of techniques that can be used but the basic concept is to add a weight matrix to the model to eliminate the autocorrelation in the data. It’s not a topic seen often but a statistician should be aware that these techniques exists in case the need arises. Software Packages Examples will given in SAS, R, and STATA. The code will be given for all of the examples.


T4 How to Find (the Right) Clients for Your Independent Statistical Consulting Business		Sat, Feb 25, 2:00 PM - 4:00 PM River Terrace 2
Instructor(s): Karen Grace-Martin, The Analysis Factor Download Handouts


If you are starting an independent statistical consulting business, you will need to learn many business skills. The most important, yet intimidating, of these is finding and attracting clients. Clients will hire (and re-hire) you only if they know, like, and trust you. This will only happen when you build a solid marketing process that conveys your strengths and what you can offer to the right clients. In this tutorial, you will learn about how to approach and get started creating a simple, yet solid, marketing plan that allows the right clients to know, like, and trust you. The instructor will share her personal experiences and case studies of colleagues who built a consulting business and guide you through small group exercises.
Outline & Objectives The tutorial will be set up in two parts. In the first, we will build the foundation of your marketing plan. This will include deciding what you want to communicate and to whom. We’ll focus on some fundamentals of establishing credibility through how you present yourself and your business through written and web material. In the second part, the instructor will share approaches on how to develop your reputation as an expert statistician and get your message out to the world. Social media offers many new and interesting ways to get recognition, but you should not ignore some of the more traditional approaches. Both parts will include small group exercises to develop a message and strategy that emphasizes your unique skills and strengths. About the Instructor Karen Grace-Martin is the founder of The Analysis Factor LLC, which provides statistical consulting and training to researchers and was previously a statistical consultant at Cornell University for seven years. She has consulted on thousands of research projects, from undergrad honor's theses to large-scale randomized trials. She is well versed in the challenges, rewards, and differences in consulting as an academic employee and self-employed business owner. She runs the popular The Analysis Factor blog, StatWise newsletter, and The Data Analysis Brown Bag webinar series. Learn more about Karen at http://TheAnalysisFactor.com. Relevance to Conference Goals This tutorial directly addresses the communication, impact, and career development theme of this conference. If you are beginning a career as an independent statistical consulting, you will need to develop management competencies in marketing and promotion. Your clients need to know who you are and they need to know the expertise that you can provide them. Your success as an independent consultant will depend on your ability to communicate clearing and interact in a friendly but professional manner. This helps you build a collaborative working relationship that will win you repeat business and referrals. Software Packages Not applicable.


GS2 Closing General Session		Sat, Feb 25, 4:15 PM - 5:30 PM River Terrace 2



The Closing Session is your opportunity to interact with the CSP Steering Committee in an open discussion about how the conference went. CSPSC vice chair, Jean Adams, will lead a panel of committee members as they summarize their conference experience. The audience will then be invited to ask questions and provide feedback. The committee highly values suggestions for improvements gathered during this time. The best student poster will also be awarded during the Closing Session, and each attendee will have an opportunity to win a door prize.

CSP 2017 Online Program

ASA Meetings Department

Share