Keynote Address | Concurrent Sessions | Poster Sessions
Short Courses (full day) | Short Courses (half day) | Tutorials | Practical Computing Demonstrations | Closing General Session with Refreshments
Thursday, February 15 | ||
Registration
|
Thu, Feb 15, 7:00 AM - 6:30 PM
|
|
|
||
SC1
Introduction to Big Data Analysis
|
Thu, Feb 15, 8:00 AM - 5:30 PM
Salon A |
|
Instructor(s): Fulya Gokalp Yavuz, Yildiz Technical University; Mark Daniel Ward, Purdue University | ||
This one-day introductory workshop is geared toward CSP participants who want to revitalize or improve their data analysis skills, especially with an emphasis on big data. Ward and Gokalp Yavuz will present tools and techniques for these most fundamental, low-level aspects of data analysis. We are well-versed at teaching such techniques to students who have no background in data analysis or programming. This workshop will bring people up to speed with powerful techniques for data analysis. This one-day course has no prerequisites. This workshop will be hands-on and driven by examples, using large data sets. The intended participants for the course are people who work in a data-driven environment and have an increasing need to perform aspects of large data analysis. Before data is gathered and organized, a great deal of data manipulation is necessary, especially for working with big data sets. Sometimes the data need to be scraped from remote sources, and then parsed into more natural forms. This process often involves munging and cleaning the data. The need to be able to reproduce and reliably verify all of the methods used for the data wrangling is more important than ever.
|
||
|
||
SC2
An Introduction to D3.Js: From Scattered to Scatterplot
|
Thu, Feb 15, 8:00 AM - 5:30 PM
Salon B |
|
Instructor(s): Scott Murray, O’Reilly Media
Download Handouts |
||
Interested in coding data visualizations on the web, but don't know where to start? This workshop will have you transforming data into visual images in no time at all, starting from scratch and building an interactive scatterplot by the end of the session. We'll use d3.js, the web's most powerful library for data visualization, to load data and translate values into SVG elements — drawing lines, points, and scaled axes to label our data. We’ll learn how to use motion and visual transitions, and introduce simple interactivity to make our charts more explorable. All methods and examples will be up-to-date for the current version of D3 (4.x as of this writing).
|
||
|
||
SC3
Collaboration Essentials for Practicing Statisticians and Data Scientists
|
Thu, Feb 15, 8:00 AM - 12:00 PM
Salon C |
|
Instructor(s): Heather Smith, Cal Poly; Eric Vance, LISA--University of Colorado Boulder
Download Handouts |
||
Statisticians and data scientists positively impact many people, organizations, and governments through the careful collection, analysis, and interpretation of data to solve problems and make decisions. To maximize their impact, statisticians and data scientists must effectively collaborate with a variety of domain experts who originate the data or the problems to be solved. In this short course, participants will learn and practice essential skills to improve their professional communication and collaboration to increase their effectiveness on the job. Specifically, participants will learn how to establish foundational collaborative relationships with domain experts; structure effective meetings; and effectively communicate with non-statisticians. Participants will also practice their newly acquired skills and learn how to improve their proficiency in these essential collaboration skills by using role-plays and video coaching and feedback reviews outside of this short course. In sum, participants will learn and practice how to leverage their technical skills to more effectively collaborate for maximal impact inside and outside of their organizations.
|
||
|
||
SC4
A Variety of Mixed Models: Linear, Generalized Linear, and Nonlinear
|
Thu, Feb 15, 8:00 AM - 12:00 PM
Salon E |
|
Instructor(s): David A. Dickey, NC State University
Download Handouts |
||
The MIXED procedure in SAS, for example, correctly handles linear models that have multiple sources of random effects such as random town to town, store to store, and aisle to aisle variation in sales. Associated fixed effects might be product price, color of packaging and amount spent on advertising. The talk begins with a checklist for deciding when to treat effects as random versus fixed and follows with a series of examples. When the response variable is not normal, for example with a binary or Poisson response, additional complexities arise. Models with such non normal responses are often analyzed by assuming that some transformation, or link function, of the expected value of Y results in a linear model with fixed and random effects. We are then in the generalized linear mixed model setting. It may be that a model cannot be linearized by a transformation, thus making it a nonnlinear model. If random effects are involved the model is referred to as a nonlinear mixed model. With a minimal amount of theory and an emphasis on examples, these types of models will be explained and illustrated. SAS will be used but the ideas and interpretation are software independent.
|
||
|
||
SC5
Cleaning Up the Data Cleaning Process: Challenges and Solutions in R
|
Thu, Feb 15, 8:00 AM - 12:00 PM
Salon D |
|
Instructor(s): Claus Thorn Ekstrøm, Biostatistics, University of Copenhagen; Anne Helby Petersen, Biostatistics, University of Copenhagen | ||
Data cleaning and validation are the first steps in any data analysis, as the validity of the conclusions from the analysis hinges on the quality of the input data. Mistakes in the data can arise for any number of reasons, including erroneous codings, malfunctioning measurement equipment, and inconsistent data generation manuals. We present a systematic, analytical approach to data cleaning that will ensure the data cleaning process to be just as structured and well-documented as the rest of the data analysis. The primary software tool is the dataMaid R package, which implements an extensive and customisable suite of quality assessment tools that can be used to identify potential problems in a dataset. The results are summarised in an auto-generated, non-technical, stand-alone document readable by statisticians and non-statisticians alike. Thus, the course teaches practical skills that aid the dialogue between data analysts and field experts, while also providing easy documentation of reproducible data cleaning steps and data quality control.
|
||
|
||
SC6
Effective Presentation for Statisticians and Data Scientists: Success=(PD)^2
|
Thu, Feb 15, 1:30 PM - 5:30 PM
Salon C |
|
Instructor(s): Jennifer H. Van Mullekom, Virginia Tech
Download Handouts |
||
Statisticians must be able to effectively convey their ideas to clients, collaborators, and decision-makers. Presenting in the modern world is even more daunting when speakers have the opportunity to employ slideware, videos, and live demos. Unfortunately, university coursework and professional development programs are often not targeted towards sharpening these skills. This short course, developed and taught by statisticians, will provide an opportunity to learn how to employ different methods and tools in the phases of the framework taught. The material covered in the course is geared toward data-based presentations and is based on the works of Garr Reynolds and Michael Alley, among others. The course will emphasize the importance of stepping away from the computer to Prepare an effective message aimed at your core point guided with a series of questions and tips. The Design phase emphasizes the importance of structure, streamlining, and good graphic design accompanied by a series of checklists. Of course, “Practice makes perfect” so we cannot skip this step. Finally, engaging the audience and effectively using the room and equipment is covered in the Deliver phase.
|
||
|
||
SC7
Statistical Learning Methods in R
|
Thu, Feb 15, 1:30 PM - 5:30 PM
Salon E |
|
Instructor(s): Kelly Sue McConville, Swarthmore College
Download Handouts |
||
Applied statisticians are often confronted with difficult modeling problems where standard regression approaches are not appropriate. For example, it may be that the number of possible predictors is large relative to the sample size or that the relationship between the variables is non-linear. This course will cover several statistical learning techniques which are designed to handle these difficult modeling problems. In particular, we will study penalized regression techniques (lasso, ridge, elasticnet), non-parametric regression (regression and smoothing splines), and classification methods (support vector machines, trees). Using data from the Bureau of Labor Statistics, participants will learn how to fit these models in R. R Markdown files with the relevant code will be provided so that participants can actively follow along with the demonstrations.
|
||
|
||
SC8
NISS Shortcourse: A Survey of Modern Data Science
|
Thu, Feb 15, 1:30 PM - 5:30 PM
Salon D |
|
Instructor(s): David Banks, Dept. of Statistical Science, Duke University
Download Handouts |
||
Modern data science is driven by applications, and these often entail Big Data and machine learning perspectives. This short course reviews key ideas and methods in nonparametric regression (starting with cross-validation and light bootstrap asymptotics, then moving on to the additive model, the generalized additive model, and neural networks. It also covers variable selection, with the Lasso and the Median Model, and describes the p >> n problem in the context of contributions by Candes and Tao, Donoho and Tanner, and Wainwright. The course next treats classification, with emphasis upon Random Forests, boosting, and ensemble strategies such as bagging, stacking and boosting.
|
||
|
||
PS1
Poster Session 1 and Opening Mixer
|
Thu, Feb 15, 5:30 PM - 7:00 PM
Salons F-I |
|
Chair(s): Alok Dwivedi, Texas Tech University Health Sciences Center El Paso (TTUHSC EP) | ||
|
||
1 Some Dimension Reduction Strategies for the Analysis of Survey Data
![]() |
||
2 Perl-Compatible Regular Expressions as a Tool to Abstract Semi-Structured Electronic Health Records
![]() |
||
3 Collaborative Process to Efficiently Produce Publications in Multicenter Research
![]() |
||
4 Developing a Comprehensive Personal Plan for Teleworking (Working Remotely)
![]() |
||
5 Thank You, Come Again: Modeling Repeat Purchase Behavior for Business Travelers
![]() |
||
6 Wavelet-Based Methods for Data-Driven Monitoring
![]() |
||
7 A Simulation Study of Violations of the Local Independence Assumption in Latent Class Analyses
![]() |
||
8 Impact of Linear Regression Predictor Omission on Estimation and Inference
![]() |
||
10 A Comparison of Standard Logistic Regression, Multilevel Modeling, Robust Error Estimation, and Exposure Simulation for Data Containing Quasi-Berkson Error
![]() |
||
11 Statistical Modeling for Repeated Measures in Rubber Research
![]() |
||
12 Combining Historical Data and Propensity Score Methods in Observational Studies to Improve Internal Validity
Miguel Marino, Oregon Health & Science University |
||
13 Marketing Communication Channel Preference Optimization Using a Two-Stage Statistical Modeling
![]() |
||
14 Limitations of Propensity Score Methods: Demonstration Using a Real-World Example
![]() |
||
15 Effect Size Measures for Nonlinear Count Regression Models
![]() |
||
16 Appropriate Dimension Reduction for Sparse, High-Dimensional Data Using Intensity Plots and Other Visualizations
![]() |
||
17 Navigating Large-Scale Forest Plots Using R and Shiny
![]() |
||
18 Ranked-Choice Voting R Package
![]() |
||
Exhibits Open
|
Thu, Feb 15, 5:30 PM - 7:00 PM
Salons F-I |
|
|
||
Friday, February 16 | ||
Registration
|
Fri, Feb 16, 7:30 AM - 5:30 PM
|
|
|
||
Continental Breakfast
|
Fri, Feb 16, 7:30 AM - 8:30 AM
Salons F-I |
|
|
||
Exhibits Open
|
Fri, Feb 16, 7:30 AM - 6:30 PM
Salons F-I |
|
|
||
GS1
Keynote Address
|
Fri, Feb 16, 8:00 AM - 9:00 AM
Salon E |
|
Chair(s): Kim Love, K. R. Love Quantitative Consulting and Collaboration | ||
|
||
8:05 AM |
Reflections on Career Opportunities and Leadership in Statistics
Lisa LaVange, The University of North Carolina |
|
CS01
#LeadWithStatistics
|
Fri, Feb 16, 9:15 AM - 10:45 AM
Salon A |
|
Chair(s): Sejong Bae, Comprehensive Cancer Center, University of Alabama | ||
|
||
9:20 AM |
Q&A with Lisa LaVange
Lisa LaVange, The University of North Carolina |
|
10:05 AM |
Developing and Delegating: Two Key Strategies to Master as a Technical Leader
![]() |
|
CS02
Practical Considerations for Modeling
|
Fri, Feb 16, 9:15 AM - 10:45 AM
Salons BC |
|
Chair(s): Trijya Singh, Le Moyne College | ||
|
||
9:20 AM |
Evaluating Model Fit for Predictive Validity
Katherine M. Wright, Northwestern University |
|
10:05 AM |
Flexible Modeling and Experimental Design Strategies
Timothy E. O'Brien, Loyola University Chicago |
|
CS03
Text Analytics Applications
|
Fri, Feb 16, 9:15 AM - 10:45 AM
Salon D |
|
Chair(s): Steven Cohen, RTI International | ||
|
||
9:20 AM |
Approachable, Interpretable Tools for Mining and Summarizing Large Text Corpora in R
![]() |
|
10:05 AM |
Latent Dirichlet Allocation Topic Models Applied to the Center for Disease Control and Prevention’s Grant
![]() |
|
CS04
Working with Messy Data
|
Fri, Feb 16, 9:15 AM - 10:45 AM
Salon E |
|
Chair(s): Karol Krotki, RTI | ||
|
||
9:20 AM |
Practical Time-Series Clustering for Messy Data in R
![]() |
|
10:05 AM |
Doing Data Linkage: A Behind-the-Scenes Look
![]() |
|
CS05
Collaboration Essentials
|
Fri, Feb 16, 11:00 AM - 12:30 PM
Salon A |
|
Chair(s): Terrie Vasilopoulos, University of Florida, College of Medicine | ||
|
||
11:05 AM |
Asking Great Questions
![]() |
|
11:50 AM |
Listening, Paraphrasing, and Summarizing
![]() |
|
CS06
Bayesian Applications
|
Fri, Feb 16, 11:00 AM - 12:30 PM
Salons BC |
|
Chair(s): Mariangela Guidolin, Department of Statistical Sciences, University of Padua | ||
|
||
11:05 AM |
Bayesian Inference for Stochastic Processes
![]() |
|
11:50 AM |
Forecasting Periodic Accumulating Processes with Semiparametric Distributional Regression Models and Bayesian Updates
Harlan D. Harris, WayUp |
|
CS07
Exploring Big Data
|
Fri, Feb 16, 11:00 AM - 12:30 PM
Salon D |
|
Chair(s): Christina Phan Knudson, University of St. Thomas | ||
|
||
11:05 AM |
Exploratory Data Structure Comparisons by Use of Principal Component Analysis
![]() |
|
11:50 AM |
Tools for Exploratory Data Analysis
![]() |
|
CS08
Streamlining Your Work Using Apps
|
Fri, Feb 16, 11:00 AM - 12:30 PM
Salon E |
|
Chair(s): Blake Langlais, Mayo Clinic | ||
|
||
11:05 AM |
Mechanizing Clinical Review Processes with R Shiny for Efficiency and Standardization
![]() |
|
11:50 AM |
Building Shiny Apps: With Great Power Comes Great Responsibility
![]() |
|
Lunch (On Own)
|
Fri, Feb 16, 12:30 PM - 2:00 PM
|
|
|
||
CS09
Presenting and Storytelling
|
Fri, Feb 16, 2:00 PM - 3:30 PM
Salon A |
|
Chair(s): Shasha Bai, University of Arkansas for Medical Sciences | ||
|
||
2:05 PM |
How to Give a Really Awful Presentation
![]() |
|
2:50 PM |
Telling the Story of Your Stats
![]() |
|
CS10
Propensity Scores and Resampling Methods
|
Fri, Feb 16, 2:00 PM - 3:30 PM
Salons BC |
|
Chair(s): Christine Wells, UCLA Statistical Consulting Group | ||
|
||
2:05 PM |
CANCELED: A Streamlined Process for Conducting a Propensity Score-Based Analysis
John A. Craycroft, University of Louisville |
|
2:50 PM |
Resampling Methods for Statistical Inference on Multi-Rater Kappas
Chia-Ling Kuo, University of Connecticut Health |
|
CS11
Data Mining Algorithms
|
Fri, Feb 16, 2:00 PM - 3:30 PM
Salon D |
|
Chair(s): Abbass Sharif, University of Southern California | ||
|
||
2:05 PM |
Stochastic Gradient Boosting on Distributed Data
![]() |
|
2:50 PM |
Deep Neural Networks for Scalable Prediction
![]() |
|
CS12
Education to Practice and Data Visualization
|
Fri, Feb 16, 2:00 PM - 3:30 PM
Salon E |
|
Chair(s): Chester Ismay, DataCamp | ||
|
||
2:05 PM |
What Is Happening at the School Level and Why It Is Important to Statistical Practice
![]() |
|
2:50 PM |
The Life-Cycle of a Project: Visualizing Data from Start to Finish
![]() |
|
CS13
Managing Up
|
Fri, Feb 16, 3:45 PM - 5:15 PM
Salon A |
|
Chair(s): Ronald Gangnon, University of Wisconsin School of Medicine and Public Health | ||
|
||
3:50 PM |
What Does It Take for an Organization to Make Difficult Information-Based Decisions? Using the Oregon Department of Forestry’s RipStream Project as a Case Study
![]() |
|
4:35 PM |
Statistics for Management of an Organization
Joyce Nilsson Orsini, Fordham University Graduate School of Business |
|
CS14
Working with Health Care Data
|
Fri, Feb 16, 3:45 PM - 5:15 PM
Salons BC |
|
Chair(s): Melanie Edwards, Exponent | ||
|
||
3:50 PM |
Application of Support Vector Machine Modeling and Graph Theory Metrics for Disease Classification
![]() |
|
4:35 PM |
Assessing Correspondence Between Two Data Sources Across Categorical Covariates with Missing Data: Application to Electronic Health Records
![]() |
|
CS15
Statisticians Teaching
|
Fri, Feb 16, 3:45 PM - 5:15 PM
Salon D |
|
Chair(s): Georgette Asherman, Direct Effects, LLC | ||
|
||
3:50 PM |
Should I Bring a Basket of Fish or Some Fishing Poles?
![]() |
|
4:35 PM |
Engaging Undergraduates in Statistical Consulting
![]() |
|
CS16
Novel Applications of Data Visualization
|
Fri, Feb 16, 3:45 PM - 5:15 PM
Salon E |
|
Chair(s): Mary Grace Crissey, CSRA | ||
|
||
3:50 PM |
Warranty/Performance Text Exploration for Modern Reliability
![]() |
|
4:35 PM |
Improving the Data Customer’s Ability to Visualize Historical Agricultural Data at the National Agricultural Statistics Service
Irwin Anolik, USDA-NASS |
|
PS2
Poster Session 2 and Refreshments
|
Fri, Feb 16, 5:15 PM - 6:30 PM
Salons F-I |
|
Chair(s): S. Keith Anderson, Mayo Clinic | ||
|
||
1 Data, Data Everywhere …, but Mind the Disclaimers: Benefits and Costs of Matching Large Cohorts to Individual US Mortality Case Data in the NDI, SSA Death Master File (DMF/SSDI), and More
![]() |
||
2 Curating and Visualizing Big Data from Wearable Activity Trackers
![]() |
||
3 Consensus Strategy for Variable Selection in Clinical Prediction Rule Development
![]() |
||
4 Reproducible Research Implemented Through Version Control Systems
![]() |
||
5 The Boeing Applied Statistics ToolKit: Best Practices and Tools for Collaboration and Reproducibility in High-Throughput Consulting
![]() |
||
6 Empirical Comparisons of Differential Expression Analysis Pipelines for RNA-Sequencing Data
![]() |
||
7 A Practical Guide for Modeling Length of Stay with Focus on Right Skewness and Zero Inflation
![]() |
||
8 Nonparametric Estimation of Time-Variant Quantiles and Statistical Models
![]() |
||
9 Estimating the Relative Excess Risk Due to Interaction in Clustered Data Settings
![]() |
||
10 Spatial Analysis of Fukushima Thyroid Ultrasound Examination Survey Data
![]() |
||
11 A Growth Reference for Mid-Upper-Arm Circumference for Age Among School-Age Children and Adolescents, with Validation for Mortality in Two Cohorts
![]() |
||
12 Machine Learning Methods for Predicting Zygosity
![]() |
||
13 Simulating Real-World Data with Time-Varying Variables
![]() |
||
14 Evaluating the Effectiveness of the Flipped Classroom Model Using Structural Equation Modeling
![]() |
||
15 Software for Covariate Specification in Linear, Logistic, and Survival Regression
![]() |
||
16 Exploratory Analyses from Different Forms of Interactive Visualizations
Lata Kodali, Virginia Tech |
||
17 Using SAS Programming to Create Complex Paneled Graphs from Electronic Health Records
![]() |
||
18 An Algorithm to Identify Family Linkages Using Electronic Health Record Data
![]() |
||
Saturday, February 17 | ||
Registration
|
Sat, Feb 17, 7:30 AM - 2:30 PM
|
|
|
||
Exhibits Open
|
Sat, Feb 17, 7:30 AM - 1:00 PM
Salons F-I |
|
|
||
PS3
Poster Session 3 and Continental Breakfast
|
Sat, Feb 17, 8:00 AM - 9:15 AM
Salons F-I |
|
Chair(s): Edward Mulrow, NORC at the University of Chicago | ||
|
||
1 Thematic Feature Selection for Research Support
![]() |
||
2 Systematizing Your Statistical Consulting Practice
![]() |
||
3 Sixteen Personalities at Work
![]() |
||
4 Re-Examining Sick Quitter Hypothesis on Association of Alcohol Consumption with Coronary Heart Disease
![]() |
||
5 Comparisons of Propensity Score Analysis for Analyzing Rare Binary Outcome
![]() |
||
6 Understanding Graduate School Speed-Dating with Generalized Linear Mixed Models
![]() |
||
7 Data Modeling to Mitigate the Impact of Missing Data in a Longitudinal Study of Injecting Drug Users
![]() |
||
8 Multivariate Statistical Analysis in Plastic Foam Research
![]() |
||
9 Win Ratio Application for a Composite Outcome in a Randomized Cardiovascular Trial
![]() |
||
10 Statistical Analysis of Network Change
![]() |
||
11 Exploring Data Quality and Time Series Event Detection in 2016 US Presidential Election Polls
![]() |
||
12 Understanding and Using Ordinal Factor Analysis
Nivedita Bhaktha, The Ohio State University |
||
13 An Easy-to-Use SAS® Macro for a Descriptive Statistics Table with P-Values
![]() |
||
14 Animated Data Visualization with Plotly: Useful Tool for Health Care Quality Improvement
![]() |
||
15 Using Accessible Patient Data to Individualize Sample Timing for Pharmacokinetic Studies
![]() |
||
CS17
Passion for Statistics
|
Sat, Feb 17, 9:15 AM - 10:45 AM
Salon A |
|
Chair(s): Kathleen A. Jablonski, The George Washington University | ||
|
||
9:20 AM |
Am I Supposed to Enjoy My Job? Career Observations from a Biostatistician
![]() |
|
10:05 AM |
Statistics in the Wild: Practicing Statistics in Nontraditional Places, from a Tiny Island in the Pacific to the Federal Cabinet
Heather Krause, Datassist |
|
CS18
Survival Analysis v. 'Survival' Analysis
|
Sat, Feb 17, 9:15 AM - 10:45 AM
Salons BC |
|
Chair(s): Yulia Marchenko, StataCorp LLC | ||
|
||
9:20 AM |
'How Long Would You Wait?' Using Time-to-Event (Survival) Analysis to Explore Waiting Times
![]() |
|
10:05 AM |
Statistical Methods for National Security Risk Quantification and Optimal Resource Allocation
![]() |
|
CS19
Business Intelligence Applications
|
Sat, Feb 17, 9:15 AM - 10:45 AM
Salon D |
|
Chair(s): Chris Holloman, ICC | ||
|
||
9:20 AM |
Business Intelligence (BI) Reporting Solution: From Source to Nuts
![]() |
|
10:05 AM |
Location Analytics: An Application of GIS
Moxie Zhang, Esri (China) |
|
CS20
Understanding Populations
|
Sat, Feb 17, 9:15 AM - 10:45 AM
Salon E |
|
Chair(s): Shelley DeVost, Los Angeles LGBT Center | ||
|
||
9:20 AM |
Quantifying Populations in Proximity to Oil and Gas Development: A National Spatial Analysis and Review
![]() |
|
10:05 AM |
Approaches and Techniques for Estimating the Total Number of Species in a Population, with Emphasis on Application to Mineral Species
![]() |
|
CS21
Developing Communication Skills
|
Sat, Feb 17, 11:00 AM - 12:30 PM
Salon A |
|
Chair(s): Cynthia R. Long, Palmer Center for Chiropractic Research, Palmer College of Chiropractic | ||
|
||
11:05 AM |
How to Communicate Statistics, and How Statisticians Should Communicate
![]() |
|
11:50 AM |
PANEL: Communication Skills: What's Next
Lillian S. Lin, Montana State University; Kim Love, K. R. Love Quantitative Consulting and Collaboration; Alicia Toledano, Biostatistics Consulting, LLC; Eric Vance, LISA--University of Colorado Boulder |
|
CS22
Small Sample Sizes and Non-Probability Sampling
|
Sat, Feb 17, 11:00 AM - 12:30 PM
Salons BC |
|
Chair(s): Amy Laird, Oregon Clinical & Translational Research Institute (OCTRI) | ||
|
||
11:05 AM |
Quantifying and Incorporating Sources of Variability and Uncertainty in Statistical Analyses with Very Small Sample Sizes
![]() |
|
11:50 AM |
Non-Probability Sampling: Wave of the Future in Survey Research?
![]() |
|
CS23
Data Science Applications
|
Sat, Feb 17, 11:00 AM - 12:30 PM
Salon D |
|
Chair(s): Sarah Burgoyne, Claritas | ||
|
||
11:05 AM |
Recent Advances in the Analysis and Detection of Communities in a Network
Frederick Kin Hing Phoa, Institute of Statistical Science, Academia Sinica |
|
11:50 AM |
Firehose Data Science: Real-Time Analytics of Twitter Feeds
![]() |
|
CS24
Causal Inference
|
Sat, Feb 17, 11:00 AM - 12:30 PM
Salon E |
|
Chair(s): Larisa G. Tereshchenko, Oregon Health and Science University | ||
|
||
11:05 AM |
Causal Inference with Multilevel Data Structures
Luke Keele, Georgetown |
|
11:50 AM |
A Decision Tool for Causal Inference and Observational Data Analysis Methods in Comparative Effectiveness Research (DECODE CER)
![]() |
|
Lunch (On Own)
|
Sat, Feb 17, 12:30 PM - 2:00 PM
|
|
|
||
PCD1
Deploying Quantitative Models as 'Visuals' in Popular Data Visualization Platforms
|
Sat, Feb 17, 2:00 PM - 4:00 PM
Salon E |
|
Instructor(s): Daniel Fylstra, Frontline Systems Inc. | ||
Data visualization and business intelligence tools such as Tableau and Power BI have become extremely popular in recent years. Tableau reports that over 90% of Fortune 500 companies are now customers, while Microsoft reports that over 200,000 organizations of all sizes are using Power BI. These tools currently offer easy-to-use access to many data sources, powerful facilities for "slicing and dicing" data, and rich, flexible data visualization, but only limited built-in analytics methods. A new avenue has emerged in the past year for extending analytics methods in both Tableau and Power BI- and this provides a new way for an analyst to develop quantitative models outside these platforms, then deploy them as 'visuals' inside Tableau and Power BI, in 'dashboards' which are often published for use by thousands of users in an organization. Though originally conceived as a way to extend the range of visualization styles, these components can perform arbitrary computations on data before it is rendered in visual form. In this session, Excel Solver developer Frontline Systems, one of the first to explore this new avenue, will demonstrate use of its tools to automatically convert existing quantitative models into 'visuals' for both Tableau and Power BI. Among other options, this enables an analyst to convert predictive (data mining, machine learning) or prescriptive (optimization, simulation) model from Microsoft Excel into an easily-deployed 'visual', just two mouse clicks. No programming is required, but the ability to extend models using high-level RASON modeling language code or programming language code is available. These 'visuals' are full-fledged models that easily connect to any Tableau or Power BI data source, and re-solve the underlying problem whenever the data sources are refreshed.
|
||
PCD2
Handling Missing Data Using Multiple Imputation
|
Sat, Feb 17, 2:00 PM - 4:00 PM
Salons BC |
|
Instructor(s): Yulia Marchenko, StataCorp LLC | ||
This workshop will cover the use of Stata to perform multiple-imputation analysis. Multiple imputation (MI) is a simulation-based technique for handling missing data. The course will provide a brief introduction to multiple imputation and will demonstrate how to perform multiple imputation in Stata. The three stages of MI (imputation, completed-data analysis, and pooling) will be discussed with accompanying Stata examples. Imputation using multivariate normal (MVN) and using chained equations (MICE, FCS) will be discussed. A number of examples demonstrating hot to efficiently manage multiply imputed data within Stata will also be provided. Linear and logistic regression analysis of multiply imputed data as well as several postestimation features will be presented. No prior knowledge of Stata is required, but basic familiarity with multiple imputation will prove useful.
|
||
T1
Engage the Room: Mastering Your Personal Presentation Style
|
Sat, Feb 17, 2:00 PM - 4:00 PM
Salon A |
|
Instructor(s): Duncan Burl Gilles, Art of Problem Solving
Download Handouts |
||
As confident as we may be in the quality of our work, presentation can make or break the impact it has. Engaging the room and communicating clearly can make the difference between an unimpressed, bored audience and a thrilled audience eager to learn more. This course will focus on presentation techniques that help you communicate your ideas effectively and in an engaging manner. You’ll be trained on ways to draw your audience into your talk, engage them in active listening and thinking, and use your voice and the space of the room to command attention and convey your message. These are skills applicable in many areas – whether presenting your work to clients, teaching in the classroom, one-on-one interviews or discussions, and even CSP talks! After the talk, participants will have the chance to send a short video of a talk to the presenter for review and feedback.
|
||
|
||
T2
Applying Propensity Score Methods to Observational Studies Using R and SAS
|
Sat, Feb 17, 2:00 PM - 4:00 PM
Eugene |
|
Instructor(s): Wei Pan, Duke University
Download Handouts |
||
Observational studies are common in applied settings but pose threats to the validity of causal inference due to selection bias in the data. Propensity score methods have been increasingly used as a means of reducing selection bias to enhance the causal claims. A training course on the application of propensity score methods to observational studies using commonly used statistical software would be beneficial for applied statisticians and researchers to improve the quality of their observational studies. With this objective, the proposed course will introduce basic concepts and practical issues of propensity score methods, including matching, stratification, and weighting; the instructors will facilitate hands-on activities of applying propensity score methods to observational studies with real-world examples using R and SAS. No prior knowledge of propensity score methods or computer programming is required. Participants are encouraged to bring their own laptop computers for hands-on activities.
|
||
|
||
T3
A Workshop on Validation of Discrete Response Statistical Models
|
Sat, Feb 17, 2:00 PM - 4:00 PM
Portland |
|
Instructor(s): Raul Eduardo Avelar Moran, Texas A&M Transportation Institute | ||
Count models are widely used to analyze discrete data in various fields. When the intent of the analysis is prediction, model validation is an important step before the model can be offered with confidence to final users. This tutorial will discuss when and why to validate, and will demonstrate model validation techniques specific to discrete response models, such as Poisson and Negative Binomial Generalized Linear Regression Models.
|
||
|
||
T4
Tools for Connecting R, SAs, and Stata to Word: A Practical Approach to Reproducibility
|
Sat, Feb 17, 2:00 PM - 4:00 PM
Salon D |
|
Instructor(s): Abigail S. Baldridge, Northwestern University; Leah J. Welty, Northwestern University
Download Handouts |
||
Reproducibility, wherein data analysis and documentation is sufficient so that results can be recomputed or verified, is an increasingly important component of statistical practice. “Weaving” tools such as R Markdown facilitate reproducibility by combining narrative text and analysis code in one plain-text document, but are of limited use when manuscripts or reports must be generated in MS Word (e.g. due to journal requirements or client preference). This course will: (1) summarize how weaving tools create Word documents, and the ensuing limitations; and (2) introduce an alternate approach using recently released StatTag software. StatTag is a free, open-source program that embeds results (values, tables, figures, or verbatim output) from R, SAS, or Stata directly in Word such that they can be automatically updated if code or data changes. This course is intended for a broad audience; prerequisites are experience preparing documents in Word and conducting analysis in any one of R, SAS, or Stata. The workshop will provide practical, hands-on examples drawn from R, SAS, and Stata, and will include an overview of weaving approaches as well as an introduction to StatTag.
|
||
|
||
GS2
Closing General Session
|
Sat, Feb 17, 4:15 PM - 5:30 PM
Salon E |
|
Chair(s): Eric Vance, LISA--University of Colorado Boulder | ||
The Closing Session is an opportunity for you to interact with the CSP Steering Committee in an open discussion about how the conference went and how it could be improved in future years. CSPSC vice chair, Eric Vance, will lead a panel of committee members as they summarize their conference experience. The audience will then be invited to ask questions and provide feedback. The committee highly values suggestions for improvements gathered during this time. The best student poster will also be awarded during the Closing Session, and each attendee will have an opportunity to win a door prize.
|
||