Online Program

Return to main conference page
Keynote Presentation | Concurrent Sessions | Poster Sessions
Short Courses (full day) | Short Courses (half day) | Tutorials | Practical Computing Demos | Closing General Session with Refreshments

Last Name:

Abstract Keyword:


Thursday, February 15
Registration Thu, Feb 15, 7:00 AM - 6:30 PM

SC1 Introduction to Big Data Analysis Thu, Feb 15, 8:00 AM - 5:30 PM
Salon A
Instructor(s): Fulya Gokalp Yavuz, Yildiz Technical University; Mark Daniel Ward, Purdue University
This one-day introductory workshop is geared toward CSP participants who want to revitalize or improve their data analysis skills, especially with an emphasis on big data. Ward and Gokalp Yavuz will present tools and techniques for these most fundamental, low-level aspects of data analysis. We are well-versed at teaching such techniques to students who have no background in data analysis or programming. This workshop will bring people up to speed with powerful techniques for data analysis. This one-day course has no prerequisites. This workshop will be hands-on and driven by examples, using large data sets. The intended participants for the course are people who work in a data-driven environment and have an increasing need to perform aspects of large data analysis. Before data is gathered and organized, a great deal of data manipulation is necessary, especially for working with big data sets. Sometimes the data need to be scraped from remote sources, and then parsed into more natural forms. This process often involves munging and cleaning the data. The need to be able to reproduce and reliably verify all of the methods used for the data wrangling is more important than ever.

SC2 An Introduction to D3.Js: From Scattered to Scatterplot Thu, Feb 15, 8:00 AM - 5:30 PM
Salon B
Instructor(s): Scott Murray, O’Reilly Media
Interested in coding data visualizations on the web, but don't know where to start? This workshop will have you transforming data into visual images in no time at all, starting from scratch and building an interactive scatterplot by the end of the session. We'll use d3.js, the web's most powerful library for data visualization, to load data and translate values into SVG elements — drawing lines, points, and scaled axes to label our data. We’ll learn how to use motion and visual transitions, and introduce simple interactivity to make our charts more explorable.

All methods and examples will be up-to-date for the current version of D3 (4.x as of this writing).

SC3 Collaboration Essentials for Practicing Statisticians and Data Scientists Thu, Feb 15, 8:00 AM - 12:00 PM
Salon C
Instructor(s): Heather Smith, Cal Poly; Eric Vance, LISA--University of Colorado Boulder
Statisticians and data scientists positively impact many people, organizations, and governments through the careful collection, analysis, and interpretation of data to solve problems and make decisions. To maximize their impact, statisticians and data scientists must effectively collaborate with a variety of domain experts who originate the data or the problems to be solved. In this short course, participants will learn and practice essential skills to improve their professional communication and collaboration to increase their effectiveness on the job. Specifically, participants will learn how to establish foundational collaborative relationships with domain experts; structure effective meetings; and effectively communicate with non-statisticians. Participants will also practice their newly acquired skills and learn how to improve their proficiency in these essential collaboration skills by using role-plays and video coaching and feedback reviews outside of this short course. In sum, participants will learn and practice how to leverage their technical skills to more effectively collaborate for maximal impact inside and outside of their organizations.

SC4 A Variety of Mixed Models: Linear, Generalized Linear, and Nonlinear Thu, Feb 15, 8:00 AM - 12:00 PM
Salon E
Instructor(s): David A. Dickey, NC State University
The MIXED procedure in SAS, for example, correctly handles linear models that have multiple sources of random effects such as random town to town, store to store, and aisle to aisle variation in sales. Associated fixed effects might be product price, color of packaging and amount spent on advertising. The talk begins with a checklist for deciding when to treat effects as random versus fixed and follows with a series of examples. When the response variable is not normal, for example with a binary or Poisson response, additional complexities arise. Models with such non normal responses are often analyzed by assuming that some transformation, or link function, of the expected value of Y results in a linear model with fixed and random effects. We are then in the generalized linear mixed model setting. It may be that a model cannot be linearized by a transformation, thus making it a nonnlinear model. If random effects are involved the model is referred to as a nonlinear mixed model. With a minimal amount of theory and an emphasis on examples, these types of models will be explained and illustrated. SAS will be used but the ideas and interpretation are software independent.

SC5 Cleaning Up the Data Cleaning Process: Challenges and Solutions in R Thu, Feb 15, 8:00 AM - 12:00 PM
Salon D
Instructor(s): Claus Thorn Ekstrøm, Biostatistics, University of Copenhagen; Anne Helby Petersen, Biostatistics, University of Copenhagen
Data cleaning and validation are the first steps in any data analysis, as the validity of the conclusions from the analysis hinges on the quality of the input data. Mistakes in the data can arise for any number of reasons, including erroneous codings, malfunctioning measurement equipment, and inconsistent data generation manuals. We present a systematic, analytical approach to data cleaning that will ensure the data cleaning process to be just as structured and well-documented as the rest of the data analysis. The primary software tool is the dataMaid R package, which implements an extensive and customisable suite of quality assessment tools that can be used to identify potential problems in a dataset. The results are summarised in an auto-generated, non-technical, stand-alone document readable by statisticians and non-statisticians alike. Thus, the course teaches practical skills that aid the dialogue between data analysts and field experts, while also providing easy documentation of reproducible data cleaning steps and data quality control.

SC6 Effective Presentation for Statisticians and Data Scientists: Success=(PD)^2 Thu, Feb 15, 1:30 PM - 5:30 PM
Salon C
Instructor(s): Jennifer H. Van Mullekom, Virginia Tech
Statisticians must be able to effectively convey their ideas to clients, collaborators, and decision-makers. Presenting in the modern world is even more daunting when speakers have the opportunity to employ slideware, videos, and live demos. Unfortunately, university coursework and professional development programs are often not targeted towards sharpening these skills. This short course, developed and taught by statisticians, will provide an opportunity to learn how to employ different methods and tools in the phases of the framework taught. The material covered in the course is geared toward data-based presentations and is based on the works of Garr Reynolds and Michael Alley, among others. The course will emphasize the importance of stepping away from the computer to Prepare an effective message aimed at your core point guided with a series of questions and tips. The Design phase emphasizes the importance of structure, streamlining, and good graphic design accompanied by a series of checklists. Of course, “Practice makes perfect” so we cannot skip this step. Finally, engaging the audience and effectively using the room and equipment is covered in the Deliver phase.

SC7 Statistical Learning Methods in R Thu, Feb 15, 1:30 PM - 5:30 PM
Salon E
Instructor(s): Kelly Sue McConville, Swarthmore College
Applied statisticians are often confronted with difficult modeling problems where standard regression approaches are not appropriate. For example, it may be that the number of possible predictors is large relative to the sample size or that the relationship between the variables is non-linear. This course will cover several statistical learning techniques which are designed to handle these difficult modeling problems. In particular, we will study penalized regression techniques (lasso, ridge, elasticnet), non-parametric regression (regression and smoothing splines), and classification methods (support vector machines, trees). Using data from the Bureau of Labor Statistics, participants will learn how to fit these models in R. R Markdown files with the relevant code will be provided so that participants can actively follow along with the demonstrations.

SC8 NISS Shortcourse: A Survey of Modern Data Science Thu, Feb 15, 1:30 PM - 5:30 PM
Salon D
Instructor(s): David Banks, Dept. of Statistical Science, Duke University
Modern data science is driven by applications, and these often entail Big Data and machine learning perspectives. This short course reviews key ideas and methods in nonparametric regression (starting with cross-validation and light bootstrap asymptotics, then moving on to the additive model, the generalized additive model, and neural networks. It also covers variable selection, with the Lasso and the Median Model, and describes the p >> n problem in the context of contributions by Candes and Tao, Donoho and Tanner, and Wainwright. The course next treats classification, with emphasis upon Random Forests, boosting, and ensemble strategies such as bagging, stacking and boosting.

PS1 Poster Session 1 and Opening Mixer Thu, Feb 15, 5:30 PM - 7:00 PM
Salons F-I
Chair(s): Alok Dwivedi, Texas Tech University Health Sciences Center El Paso (TTUHSC EP)

Some Dimension Reduction Strategies for the Analysis of Survey Data
Jiaying Weng, University of Kentucky
Perl-Compatible Regular Expressions as a Tool to Abstract Semi-Structured Electronic Health Records
Samantha Emily Montag, Northwestern University
Collaborative Process to Efficiently Produce Publications in Multicenter Research
Cody S. Olsen, University of Utah, Department of Pediatrics
Developing a Comprehensive Personal Plan for Teleworking (Working Remotely)
Julia Lull, Janssen Research & Development, LLC
Thank You, Come Again: Modeling Repeat Purchase Behavior for Business Travelers
Diag D. Davenport, Georgetown University
Wavelet-Based Methods for Data-Driven Monitoring
Achraf Cohen, University of West Florida
A Simulation Study of Violations of the Local Independence Assumption in Latent Class Analyses
Michael P. Chen, U.S. Centers for Disease Control and Prevention
Impact of Linear Regression Predictor Omission on Estimation and Inference
Julia L. Sharp, Colorado State University
A Comparison of Standard Logistic Regression, Multilevel Modeling, Robust Error Estimation, and Exposure Simulation for Data Containing Quasi-Berkson Error
Angelique Liddell Zeringue, Mercy Healthcare
Statistical Modeling for Repeated Measures in Rubber Research
Wenzhao Yang, The Dow Chemical Company
Combining Historical Data and Propensity Score Methods in Observational Studies to Improve Internal Validity
Miguel Marino, Oregon Health & Science University
Marketing Communication Channel Preference Optimization Using a Two-Stage Statistical Modeling
Hongying Yang, Statistical consultant
Limitations of Propensity Score Methods: Demonstration Using a Real-World Example
Gregory B. Tallman, Oregon State University/Oregon Health & Science University
Effect Size Measures for Nonlinear Count Regression Models
Stefany Coxe, Florida International University
Appropriate Dimension Reduction for Sparse, High-Dimensional Data Using Intensity Plots and Other Visualizations
Eugenie Jackson, West Virginia University
Navigating Large-Scale Forest Plots Using R and Shiny
Steele Valenzuela, Oregon Health & Science University
Ranked-Choice Voting R Package
Jay Lee, Reed College
Exhibits Open Thu, Feb 15, 5:30 PM - 7:00 PM
Salons F-I

Friday, February 16
Registration Fri, Feb 16, 7:30 AM - 5:30 PM

Continental Breakfast Fri, Feb 16, 7:30 AM - 8:30 AM
Salons F-I

Exhibits Open Fri, Feb 16, 7:30 AM - 6:30 PM
Salons F-I

GS1 Keynote Address Fri, Feb 16, 8:00 AM - 9:00 AM
Salon E
Chair(s): Kim Love, K. R. Love Quantitative Consulting and Collaboration

8:05 AM Reflections on Career Opportunities and Leadership in Statistics
Lisa LaVange, The University of North Carolina
CS01 #LeadWithStatistics Fri, Feb 16, 9:15 AM - 10:45 AM
Salon A
Chair(s): Sejong Bae, Comprehensive Cancer Center, University of Alabama

9:20 AM Q&A with Lisa LaVange
Lisa LaVange, The University of North Carolina
10:05 AM Developing and Delegating: Two Key Strategies to Master as a Technical Leader
Diahanna L. Post, Nielsen, Columbia University
CS02 Practical Considerations for Modeling Fri, Feb 16, 9:15 AM - 10:45 AM
Salons BC
Chair(s): Trijya Singh, Le Moyne College

9:20 AM Evaluating Model Fit for Predictive Validity
Katherine M. Wright, Northwestern University
10:05 AM Flexible Modeling and Experimental Design Strategies
Timothy E. O'Brien, Loyola University Chicago
CS03 Text Analytics Applications Fri, Feb 16, 9:15 AM - 10:45 AM
Salon D
Chair(s): Steven Cohen, RTI International

9:20 AM Approachable, Interpretable Tools for Mining and Summarizing Large Text Corpora in R
Luke W. Miratrix, Harvard University
10:05 AM Latent Dirichlet Allocation Topic Models Applied to the Center for Disease Control and Prevention’s Grant
Matthew Keith Eblen, Centers for Disease Control and Prevention
CS04 Working with Messy Data Fri, Feb 16, 9:15 AM - 10:45 AM
Salon E
Chair(s): Karol Krotki, RTI

9:20 AM Practical Time-Series Clustering for Messy Data in R
Jonathan Robert Page, University of Hawaii Economic Research Organization (UHERO)
10:05 AM Doing Data Linkage: A Behind-the-Scenes Look
Clinton J. Thompson, National Center for Health Statistics, CDC
CS05 Collaboration Essentials Fri, Feb 16, 11:00 AM - 12:30 PM
Salon A
Chair(s): Terrie Vasilopoulos, University of Florida, College of Medicine

11:05 AM Asking Great Questions
Eric Vance, LISA--University of Colorado Boulder
11:50 AM Listening, Paraphrasing, and Summarizing
Heather Smith, Cal Poly
CS06 Bayesian Applications Fri, Feb 16, 11:00 AM - 12:30 PM
Salons BC
Chair(s): Mariangela Guidolin, Department of Statistical Sciences, University of Padua

11:05 AM Bayesian Inference for Stochastic Processes
Lyle David Broemeling, University of Texas MD Anderson Cancer Center
11:50 AM Forecasting Periodic Accumulating Processes with Semiparametric Distributional Regression Models and Bayesian Updates
Harlan D. Harris, WayUp
CS07 Exploring Big Data Fri, Feb 16, 11:00 AM - 12:30 PM
Salon D
Chair(s): Christina Phan Knudson, University of St. Thomas

11:05 AM Exploratory Data Structure Comparisons by Use of Principal Component Analysis
Anne Helby Petersen, Biostatistics, University of Copenhagen
11:50 AM Tools for Exploratory Data Analysis
Wendy L. Martinez, U.S. Bureau of Labor Statistics
CS08 Streamlining Your Work Using Apps Fri, Feb 16, 11:00 AM - 12:30 PM
Salon E
Chair(s): Blake Langlais, Mayo Clinic

11:05 AM Mechanizing Clinical Review Processes with R Shiny for Efficiency and Standardization
Jimmy Wong, Food and Drug Administration
11:50 AM Building Shiny Apps: With Great Power Comes Great Responsibility
Jessica Minnier, Oregon Health & Science University
Lunch (On Own) Fri, Feb 16, 12:30 PM - 2:00 PM

CS09 Presenting and Storytelling Fri, Feb 16, 2:00 PM - 3:30 PM
Salon A
Chair(s): Shasha Bai, University of Arkansas for Medical Sciences

2:05 PM How to Give a Really Awful Presentation
Paul Teetor, William Blair & Co
2:50 PM Telling the Story of Your Stats
Jennifer H. Van Mullekom, Virginia Tech
CS10 Propensity Scores and Resampling Methods Fri, Feb 16, 2:00 PM - 3:30 PM
Salons BC
Chair(s): Christine Wells, UCLA Statistical Consulting Group

2:05 PM CANCELED: A Streamlined Process for Conducting a Propensity Score-Based Analysis
John A. Craycroft, University of Louisville
2:50 PM Resampling Methods for Statistical Inference on Multi-Rater Kappas
Chia-Ling Kuo, University of Connecticut Health
CS11 Data Mining Algorithms Fri, Feb 16, 2:00 PM - 3:30 PM
Salon D
Chair(s): Abbass Sharif, University of Southern California

2:05 PM Stochastic Gradient Boosting on Distributed Data
Roxy Cramer, Rogue Wave Software
2:50 PM Deep Neural Networks for Scalable Prediction
Lynd Bacon, Loma Buena Assoc./Notre Dame Univ./Northwestern Univ.
CS12 Education to Practice and Data Visualization Fri, Feb 16, 2:00 PM - 3:30 PM
Salon E
Chair(s): Chester Ismay, DataCamp

2:05 PM What Is Happening at the School Level and Why It Is Important to Statistical Practice
Jane Watson, University of Tasmania
2:50 PM The Life-Cycle of a Project: Visualizing Data from Start to Finish
Nola du Toit, NORC at the University of Chicago
CS13 Managing Up Fri, Feb 16, 3:45 PM - 5:15 PM
Salon A
Chair(s): Ronald Gangnon, University of Wisconsin School of Medicine and Public Health

3:50 PM What Does It Take for an Organization to Make Difficult Information-Based Decisions? Using the Oregon Department of Forestry’s RipStream Project as a Case Study
Jeremy Groom, Groom Analytics
4:35 PM Statistics for Management of an Organization
Joyce Nilsson Orsini, Fordham University Graduate School of Business
CS14 Working with Health Care Data Fri, Feb 16, 3:45 PM - 5:15 PM
Salons BC
Chair(s): Melanie Edwards, Exponent

3:50 PM Application of Support Vector Machine Modeling and Graph Theory Metrics for Disease Classification
Jessica Michelle Rudd, Kennesaw State University
4:35 PM Assessing Correspondence Between Two Data Sources Across Categorical Covariates with Missing Data: Application to Electronic Health Records
Emile Latour, Oregon Health & Science University
CS15 Statisticians Teaching Fri, Feb 16, 3:45 PM - 5:15 PM
Salon D
Chair(s): Georgette Asherman, Direct Effects, LLC

3:50 PM Should I Bring a Basket of Fish or Some Fishing Poles?
Kathy Hall, Hewlett Packard
4:35 PM Engaging Undergraduates in Statistical Consulting
Christina Phan Knudson, University of St. Thomas
CS16 Novel Applications of Data Visualization Fri, Feb 16, 3:45 PM - 5:15 PM
Salon E
Chair(s): Mary Grace Crissey, CSRA

3:50 PM Warranty/Performance Text Exploration for Modern Reliability
Scott Lee Wise, SAS Institute, Inc.
4:35 PM Improving the Data Customer’s Ability to Visualize Historical Agricultural Data at the National Agricultural Statistics Service
Irwin Anolik, USDA-NASS
PS2 Poster Session 2 and Refreshments Fri, Feb 16, 5:15 PM - 6:30 PM
Salons F-I
Chair(s): S. Keith Anderson, Mayo Clinic

Data, Data Everywhere …, but Mind the Disclaimers: Benefits and Costs of Matching Large Cohorts to Individual US Mortality Case Data in the NDI, SSA Death Master File (DMF/SSDI), and More
Sigurd Wilson Hermansen, Westat
Curating and Visualizing Big Data from Wearable Activity Trackers
Meike Niederhausen, OHSU-PSU School of Public Health
Consensus Strategy for Variable Selection in Clinical Prediction Rule Development
Miriam R. Elman, OHSU/OSU College of Pharamcy
Reproducible Research Implemented Through Version Control Systems
Lillian S. Lin, Montana State University
The Boeing Applied Statistics ToolKit: Best Practices and Tools for Collaboration and Reproducibility in High-Throughput Consulting
Robert Michael Lawton, Boeing Research & Technology
Empirical Comparisons of Differential Expression Analysis Pipelines for RNA-Sequencing Data
Lina Gao, Biostatistics Shared Resource (OHSU BSR); Biostatistics and Bioinformatics Unit (ONPRC BBU)
A Practical Guide for Modeling Length of Stay with Focus on Right Skewness and Zero Inflation
Lizhou Nie, Stony Brook University
Nonparametric Estimation of Time-Variant Quantiles and Statistical Models
Jessica Michelle Rudd, Kennesaw State University
Estimating the Relative Excess Risk Due to Interaction in Clustered Data Settings
Katharine Fischer Berry Correia, Harvard T.H. Chan School of Public Health
Spatial Analysis of Fukushima Thyroid Ultrasound Examination Survey Data
Emerson H. Webb, Reed College
A Growth Reference for Mid-Upper-Arm Circumference for Age Among School-Age Children and Adolescents, with Validation for Mortality in Two Cohorts
Lazarus K. Mramba, University of Florida
Machine Learning Methods for Predicting Zygosity
Ally Rochelle Avery, Washington State University
Simulating Real-World Data with Time-Varying Variables
Maria Emilia de Oliveira Montez-Rath, Stanford University
Evaluating the Effectiveness of the Flipped Classroom Model Using Structural Equation Modeling
Shan Wang, Assistant Professor
Software for Covariate Specification in Linear, Logistic, and Survival Regression
Sai Liu, Stanford University
Exploratory Analyses from Different Forms of Interactive Visualizations
Lata Kodali, Virginia Tech
Using SAS Programming to Create Complex Paneled Graphs from Electronic Health Records
Carrie Tillotson, OCHIN, Inc.
An Algorithm to Identify Family Linkages Using Electronic Health Record Data
Megan Hoopes, OCHIN, Inc.
Saturday, February 17
Registration Sat, Feb 17, 7:30 AM - 2:30 PM

Exhibits Open Sat, Feb 17, 7:30 AM - 1:00 PM
Salons F-I

PS3 Poster Session 3 and Continental Breakfast Sat, Feb 17, 8:00 AM - 9:15 AM
Salons F-I
Chair(s): Edward Mulrow, NORC at the University of Chicago

Thematic Feature Selection for Research Support
Thealexa Becker, Federal Reserve Bank of Kansas City
Systematizing Your Statistical Consulting Practice
Terrie Vasilopoulos, University of Florida, College of Medicine
Sixteen Personalities at Work
Katherine Eleanor Tranbarger Freier, Intel Corporation
Re-Examining Sick Quitter Hypothesis on Association of Alcohol Consumption with Coronary Heart Disease
Amy Z. Fan, National Institute of Health
Comparisons of Propensity Score Analysis for Analyzing Rare Binary Outcome
Jihye Park, Stony Brook University
Understanding Graduate School Speed-Dating with Generalized Linear Mixed Models
Christina Phan Knudson, University of St. Thomas
Data Modeling to Mitigate the Impact of Missing Data in a Longitudinal Study of Injecting Drug Users
Tania Amanda Patrao, University of Queensland, Australia
Multivariate Statistical Analysis in Plastic Foam Research
Wenyu Su, The Dow Chemical Company
Win Ratio Application for a Composite Outcome in a Randomized Cardiovascular Trial
Rose A. Hamershock, TIMI Study Group
Statistical Analysis of Network Change
Teresa Danielle Schmidt, Portland State University
Exploring Data Quality and Time Series Event Detection in 2016 US Presidential Election Polls
Kaelyn M. Rosenberg, Reed College
Understanding and Using Ordinal Factor Analysis
Nivedita Bhaktha, The Ohio State University
An Easy-to-Use SAS® Macro for a Descriptive Statistics Table with P-Values
Yuanchao Zheng, Stanford University
Animated Data Visualization with Plotly: Useful Tool for Health Care Quality Improvement
Eric A. Tesdahl, SpecialtyCare, Inc.
Using Accessible Patient Data to Individualize Sample Timing for Pharmacokinetic Studies
Matthew Stephen Shotwell, Vanderbilt University Medical Center
CS17 Passion for Statistics Sat, Feb 17, 9:15 AM - 10:45 AM
Salon A
Chair(s): Kathleen A. Jablonski, The George Washington University

9:20 AM Am I Supposed to Enjoy My Job? Career Observations from a Biostatistician
Daniel Thomas Cotton, Boehringer Ingelheim Pharmaceuticals
10:05 AM Statistics in the Wild: Practicing Statistics in Nontraditional Places, from a Tiny Island in the Pacific to the Federal Cabinet
Heather Krause, Datassist
CS18 Survival Analysis v. 'Survival' Analysis Sat, Feb 17, 9:15 AM - 10:45 AM
Salons BC
Chair(s): Yulia Marchenko, StataCorp LLC

9:20 AM 'How Long Would You Wait?' Using Time-to-Event (Survival) Analysis to Explore Waiting Times
Ruth Hummel, SAS Institute
10:05 AM Statistical Methods for National Security Risk Quantification and Optimal Resource Allocation
Robert Brigantic, Pacific Northwest National Laboratory
CS19 Business Intelligence Applications Sat, Feb 17, 9:15 AM - 10:45 AM
Salon D
Chair(s): Chris Holloman, ICC

9:20 AM Business Intelligence (BI) Reporting Solution: From Source to Nuts
Andrew Piskorowski, Survey Research Center, University of Michigan
10:05 AM Location Analytics: An Application of GIS
Moxie Zhang, Esri (China)
CS20 Understanding Populations Sat, Feb 17, 9:15 AM - 10:45 AM
Salon E
Chair(s): Shelley DeVost, Los Angeles LGBT Center

9:20 AM Quantifying Populations in Proximity to Oil and Gas Development: A National Spatial Analysis and Review
Tanja Srebotnjak, Harvey Mudd College
10:05 AM Approaches and Techniques for Estimating the Total Number of Species in a Population, with Emphasis on Application to Mineral Species
Grethe Hystad, Purdue University Northwest
CS21 Developing Communication Skills Sat, Feb 17, 11:00 AM - 12:30 PM
Salon A
Chair(s): Cynthia R. Long, Palmer Center for Chiropractic Research, Palmer College of Chiropractic

11:05 AM How to Communicate Statistics, and How Statisticians Should Communicate
Achim Guettner, Novartis Pharma
11:50 AM PANEL: Communication Skills: What's Next
Lillian S. Lin, Montana State University; Kim Love, K. R. Love Quantitative Consulting and Collaboration; Alicia Toledano, Biostatistics Consulting, LLC; Eric Vance, LISA--University of Colorado Boulder
CS22 Small Sample Sizes and Non-Probability Sampling Sat, Feb 17, 11:00 AM - 12:30 PM
Salons BC
Chair(s): Amy Laird, Oregon Clinical & Translational Research Institute (OCTRI)

11:05 AM Quantifying and Incorporating Sources of Variability and Uncertainty in Statistical Analyses with Very Small Sample Sizes
Annette M. Bachand, Ramboll Environ
11:50 AM Non-Probability Sampling: Wave of the Future in Survey Research?
Karol Krotki, RTI
CS23 Data Science Applications Sat, Feb 17, 11:00 AM - 12:30 PM
Salon D
Chair(s): Sarah Burgoyne, Claritas

11:05 AM Recent Advances in the Analysis and Detection of Communities in a Network
Frederick Kin Hing Phoa, Institute of Statistical Science, Academia Sinica
11:50 AM Firehose Data Science: Real-Time Analytics of Twitter Feeds
David Corliss, Ford Motor Company
CS24 Causal Inference Sat, Feb 17, 11:00 AM - 12:30 PM
Salon E
Chair(s): Larisa G. Tereshchenko, Oregon Health and Science University

11:05 AM Causal Inference with Multilevel Data Structures
Luke Keele, Georgetown
11:50 AM A Decision Tool for Causal Inference and Observational Data Analysis Methods in Comparative Effectiveness Research (DECODE CER)
Douglas Landsittel, University of Pittsburgh
Lunch (On Own) Sat, Feb 17, 12:30 PM - 2:00 PM

PCD1 Deploying Quantitative Models as 'Visuals' in Popular Data Visualization Platforms Sat, Feb 17, 2:00 PM - 4:00 PM
Salon E
Instructor(s): Daniel Fylstra, Frontline Systems Inc.
Data visualization and business intelligence tools such as Tableau and Power BI have become extremely popular in recent years. Tableau reports that over 90% of Fortune 500 companies are now customers, while Microsoft reports that over 200,000 organizations of all sizes are using Power BI. These tools currently offer easy-to-use access to many data sources, powerful facilities for "slicing and dicing" data, and rich, flexible data visualization, but only limited built-in analytics methods.

A new avenue has emerged in the past year for extending analytics methods in both Tableau and Power BI- and this provides a new way for an analyst to develop quantitative models outside these platforms, then deploy them as 'visuals' inside Tableau and Power BI, in 'dashboards' which are often published for use by thousands of users in an organization. Though originally conceived as a way to extend the range of visualization styles, these components can perform arbitrary computations on data before it is rendered in visual form.

In this session, Excel Solver developer Frontline Systems, one of the first to explore this new avenue, will demonstrate use of its tools to automatically convert existing quantitative models into 'visuals' for both Tableau and Power BI. Among other options, this enables an analyst to convert predictive (data mining, machine learning) or prescriptive (optimization, simulation) model from Microsoft Excel into an easily-deployed 'visual', just two mouse clicks. No programming is required, but the ability to extend models using high-level RASON modeling language code or programming language code is available. These 'visuals' are full-fledged models that easily connect to any Tableau or Power BI data source, and re-solve the underlying problem whenever the data sources are refreshed.

PCD2 Handling Missing Data Using Multiple Imputation Sat, Feb 17, 2:00 PM - 4:00 PM
Salons BC
Instructor(s): Yulia Marchenko, StataCorp LLC
This workshop will cover the use of Stata to perform multiple-imputation analysis. Multiple imputation (MI) is a simulation-based technique for handling missing data. The course will provide a brief introduction to multiple imputation and will demonstrate how to perform multiple imputation in Stata. The three stages of MI (imputation, completed-data analysis, and pooling) will be discussed with accompanying Stata examples. Imputation using multivariate normal (MVN) and using chained equations (MICE, FCS) will be discussed. A number of examples demonstrating hot to efficiently manage multiply imputed data within Stata will also be provided. Linear and logistic regression analysis of multiply imputed data as well as several postestimation features will be presented. No prior knowledge of Stata is required, but basic familiarity with multiple imputation will prove useful.

T1 Engage the Room: Mastering Your Personal Presentation Style Sat, Feb 17, 2:00 PM - 4:00 PM
Salon A
Instructor(s): Duncan Burl Gilles, Art of Problem Solving
As confident as we may be in the quality of our work, presentation can make or break the impact it has. Engaging the room and communicating clearly can make the difference between an unimpressed, bored audience and a thrilled audience eager to learn more. This course will focus on presentation techniques that help you communicate your ideas effectively and in an engaging manner. You’ll be trained on ways to draw your audience into your talk, engage them in active listening and thinking, and use your voice and the space of the room to command attention and convey your message. These are skills applicable in many areas – whether presenting your work to clients, teaching in the classroom, one-on-one interviews or discussions, and even CSP talks! After the talk, participants will have the chance to send a short video of a talk to the presenter for review and feedback.

T2 Applying Propensity Score Methods to Observational Studies Using R and SAS Sat, Feb 17, 2:00 PM - 4:00 PM
Instructor(s): Wei Pan, Duke University
Observational studies are common in applied settings but pose threats to the validity of causal inference due to selection bias in the data. Propensity score methods have been increasingly used as a means of reducing selection bias to enhance the causal claims. A training course on the application of propensity score methods to observational studies using commonly used statistical software would be beneficial for applied statisticians and researchers to improve the quality of their observational studies. With this objective, the proposed course will introduce basic concepts and practical issues of propensity score methods, including matching, stratification, and weighting; the instructors will facilitate hands-on activities of applying propensity score methods to observational studies with real-world examples using R and SAS. No prior knowledge of propensity score methods or computer programming is required. Participants are encouraged to bring their own laptop computers for hands-on activities.

T3 A Workshop on Validation of Discrete Response Statistical Models Sat, Feb 17, 2:00 PM - 4:00 PM
Instructor(s): Raul Eduardo Avelar Moran, Texas A&M Transportation Institute
Count models are widely used to analyze discrete data in various fields. When the intent of the analysis is prediction, model validation is an important step before the model can be offered with confidence to final users. This tutorial will discuss when and why to validate, and will demonstrate model validation techniques specific to discrete response models, such as Poisson and Negative Binomial Generalized Linear Regression Models.

T4 Tools for Connecting R, SAs, and Stata to Word: A Practical Approach to Reproducibility Sat, Feb 17, 2:00 PM - 4:00 PM
Salon D
Instructor(s): Abigail S. Baldridge, Northwestern University; Leah J. Welty, Northwestern University
Reproducibility, wherein data analysis and documentation is sufficient so that results can be recomputed or verified, is an increasingly important component of statistical practice. “Weaving” tools such as R Markdown facilitate reproducibility by combining narrative text and analysis code in one plain-text document, but are of limited use when manuscripts or reports must be generated in MS Word (e.g. due to journal requirements or client preference). This course will: (1) summarize how weaving tools create Word documents, and the ensuing limitations; and (2) introduce an alternate approach using recently released StatTag software. StatTag is a free, open-source program that embeds results (values, tables, figures, or verbatim output) from R, SAS, or Stata directly in Word such that they can be automatically updated if code or data changes. This course is intended for a broad audience; prerequisites are experience preparing documents in Word and conducting analysis in any one of R, SAS, or Stata. The workshop will provide practical, hands-on examples drawn from R, SAS, and Stata, and will include an overview of weaving approaches as well as an introduction to StatTag.

GS2 Closing General Session Sat, Feb 17, 4:15 PM - 5:30 PM
Salon E
Chair(s): Eric Vance, LISA--University of Colorado Boulder
The Closing Session is an opportunity for you to interact with the CSP Steering Committee in an open discussion about how the conference went and how it could be improved in future years. CSPSC vice chair, Eric Vance, will lead a panel of committee members as they summarize their conference experience. The audience will then be invited to ask questions and provide feedback. The committee highly values suggestions for improvements gathered during this time. The best student poster will also be awarded during the Closing Session, and each attendee will have an opportunity to win a door prize.