Back to search menu
Saturday, February 16
Sat, Feb 16
7:30 AM - 2:30 PM
3rd Floor Registration Counter S
Registration
Registration
Sat, Feb 16
7:30 AM - 1:00 PM
St. James Ballroom
Exhibits Open
Exhibits
Sat, Feb 16
8:00 AM - 9:15 AM
St. James Ballroom
PS3 - Poster Session 3 and Continental Breakfast
Poster Session
Chair(s): Charles Minard, Baylor College of Medicine
1
3
4
5
6
7
8
10
11
12
14
15
16
17
18
19
Sat, Feb 16
9:15 AM - 10:45 AM
Camp
Chair(s): Caitlin Mary Cunningham, Le Moyne College
Sat, Feb 16
9:15 AM - 10:45 AM
Canal
Chair(s): Ella Revzin, Precima
Sat, Feb 16
9:15 AM - 10:45 AM
Jackson
Chair(s): Birol Emir, Pfizer Inc.
Sat, Feb 16
9:15 AM - 10:45 AM
Magazine
Chair(s): Naomi B. Robbins, NBR
Sat, Feb 16
11:00 AM - 12:30 PM
Camp
Chair(s): Kayéromi Gomez, University of Illinois College of Medicine
Sat, Feb 16
11:00 AM - 12:30 PM
Canal
CS22 -
Behind the Model: Modeling Approaches and Strategies
Fill out evaluation
Concurrent Session
Chair(s): Steven B. Cohen, RTI International
Sat, Feb 16
11:00 AM - 12:30 PM
Jackson
Chair(s): Raja Velu, Syracuse University
Sat, Feb 16
11:00 AM - 12:30 PM
Magazine
Chair(s): Sejong Bae, University of Alabama at Birmingham
Sat, Feb 16
12:30 PM - 2:00 PM
Lunch (On Own)
Other
Sat, Feb 16
2:00 PM - 4:00 PM
Camp
PCD1 -
Introduction to Structural Equation Modeling Using Stata
Fill out evaluation
Practical Computing Demo
Instructor(s): Chuck Huber, StataCorp
This workshop introduces the concepts and jargon of structural equation modeling (SEM) including path diagrams, latent variables, endogenous and exogenous variables, and goodness of fit. I will describe the similarities and differences between Stata's -sem- and -gsem- commands. Then I demonstrate how to fit many familiar models such as linear regression, multivariate regression, logistic regression, confirmatory factor analysis, and multilevel models using -sem- and -gsem-. I conclude demonstrating how to fit structural equation models that contain both structural and measurement components.
Outline & Objectives
Participants will learn about the following concepts and tools:
Observed and latent variables
Exogenous and endogenous variables
Recursive and nonrecursive models
Model assumptions
Checking the fit of a structural equation model
How to draw a path diagram using Stata’s SEM Builder
How to use Stata’s -sem- command syntax
How to use Stata’s -gsem- command syntax
Differences and similarities between -sem- and -gsem-
How to fit structural equation models by group
How to constraint model parameters
How to fit a mediation model using SEM
How to estimate descriptive statistics such as sample means, variance, and correlation with SEM
How to fit familiar models such as linear and logistic regression using SEM
How to fit confirmatory factor analysis (CFA) models using SEM
About the Instructor
Chuck Huber is a Senior Statistician at StataCorp and Adjunct Associate Professor of Biostatistics at the Texas A&M School of Public Health. In addition to working with Stata's team of software developers, he produces instructional videos for the Stata YouTube channel, writes blog entries, develops online NetCourses and gives talks about Stata at conferences and universities. Most of his current work is
focused on statistical methods used by psychologists and other behavioral scientists. He has
published in the areas of neurology, human and animal genetics, alcohol and drug abuse prevention, nutrition and birth defects. Dr. Huber currently teaches introductory biostatistics at Texas A&M where he previously taught categorical data analysis, survey data analysis, and statistical genetics.
Relevance to Conference Goals
Structural equation modeling has become increasingly popular for modeling the interrelationships among a group of variables. Many researchers us SEM to understand causal relationships in complex systems. This talk introduces this powerful tool using the popular statistical package Stata.
Sat, Feb 16
2:00 PM - 4:00 PM
Jackson
PCD2 -
Interfacing R with Excel in Two Different Ways
Fill out evaluation
Practical Computing Demo
Thanks to its popularity and user-friendly environment, Microsoft Excel is widely used to gain data insights and make better decisions. However, compared to mainstream statistical software such as R, Excel lacks advanced statistical tools taken solely or integrated into procedures. On the other hand, R is a coding software associated to a steep learning curve. In order to interface the unlimited statistical possibilities of R with the user-friendly environment of Excel, two features have recently been developed within the XLSTAT software: 1) XLSTAT-R helps programmers develop user-friendly dialog boxes in Excel allowing users to launch customized R procedures directly on data selected in Excel with their mouse. 2) The XLSTAT-RNotebook allows writing R code in Excel cells with the possibility of capturing data in the form of Excel cell ranges. The outputs are also displayed in Excel. This makes it possible to create complex dashboards or reports in Excel made from R code. The created procedures can then be used by colleagues, students or clients who don’t necessarily know how to code. This tutorial shows how developers can build customized R procedures in an Excel dialog box or directly in Excel cells using XLSTAT.
Basic coding skills are required (preferably R).
Outline & Objectives
Outline:
1. Introduction to XLSTAT-R and the XLSTAT-RNotebook.
2. Application: Making the pam{cluster} R function available in an Excel dialog box and adding the possibility to customize several options and charts from within the dialog box.
3. Application: Developing a customized R-based dashboard in an Excel sheet using the XLSTAT-RNotebook.
Objectives:
At the end of this tutorial, participants will understand the basics of XLSTAT-R or the XLSTAT-RNotebook, used to develop R-based statistical applications or dashboards in Excel.
About the Instructor
Jean Paul Maalouf (PhD) is an independent statistical consultant with 10 years of experience. He has worked for 4 years at Addinsoft as the brand manager of the XLSTAT Software, leader in statistical software for Excel. He substantially contributed to the development of the XLSTAT-R engine and has created many of the default XLSTAT-R procedures included in XLSTAT solutions.
Relevance to Conference Goals
The open-source R software is known for its steep learning curve. Data-inspired decision makers often prefer relying on dashboards or user-friendly environments such as Microsoft Excel. This tutorial shows how data science, data analysis and modeling procedures built in R can be made available to any Excel user thanks to XLSTAT-R and the XLSTAT-RNotebook. These developments are possible under different collaboration scenarios. Chief programming statisticians are able to customize applications for decision makers. Consultants are able to set up Excel applications tailored to the specific needs of their customers. Professors are able to develop customized statistical programs in Excel to illustrate their courses.
Sat, Feb 16
2:00 PM - 4:00 PM
Royal
Increasingly complex observational studies are commonplace in a numerous data science settings, including biomedical, health services, pharmaceutical, insurance and online advertising. To adequately estimate causal effect sizes, proper control of known potential confounders is critical. Having gained enormous popularity in the recent years, propensity score methods are powerful and elegant tools for estimating causal effects. Without assuming prior knowledge of propensity score methods, this short course will use simulated and real data examples to introduce and illustrate important techniques involving propensity scores, such as weighting, matching and sub-classification. Relevant R and SAS software packages for implementing data analyses will be discussed in detail. Specific topics to be covered include guidelines on how to construct a propensity score model, create matched pairs for binary group comparisons, assess baseline covariate balance after matching and use inverse propensity score weighting techniques. Illustrative examples will accompany each topic and a brief review of recent relevant developments and their implementation will also be discussed.
Outline & Objectives
Outline:
- Observational Studies: definition, examples, causal effects, confounding control.
- Propensity Scores: definition, properties, modeling techniques.
- Propensity Score Approaches in Observational Studies: weighting, matching, sub-classification; graphical methods to assess covariate balance after matching;
- Illustration of these techniques using R packages MatchIt, Matching and optmatch, as well as SAS PROCs CAUSALTRT and PSMATCH.
- Guidelines on how to best describe the methodology utilized and the results obtained when presenting to a non-technical audience.
- Brief review of most recent methods developments and discussion of their potential for immediate use in practice.
Objectives: The first objective is to provide an example-centered overview of the most commonly used propensity score-based methods in observational studies. The second objective is to present the practical implementation of these methods and highlight the newly developed SAS PROCs CAUSALTRT and PSMATCH. The third objective is to discuss the advantages and disadvantages associated with these methods.
About the Instructor
Dr. Andrei received a Ph.D. degree in Biostatistics from the University of Michigan in 2005. He is currently an Associate Professor in the Department of Preventive Medicine at Northwestern University, where he enjoys successful collaborations in cardiovascular outcomes research. He has developed expertise in MSMs and published relevant studies in adult cardiac surgery. He has developed practice-inspired and -oriented statistical methods in survival analysis, recurrent events, group sequential monitoring methods, hierarchical methods, and predictive modeling. In the last 15 years, Dr. Andrei has collaborated with medical researchers in fields such as pulmonary/critical care, organ transplantation, nursing, prostate and breast cancer, anesthesiology and thoracic surgery. Currently, he serves as Statistical Co-Editor of the Journal of the American College of Surgeons and deputy Statistical Editor of the Journal of Thoracic and Cardiovascular Surgery.
Relevance to Conference Goals
Upon attending this short-course course, participants will gain familiarity with propensity score-based methods for estimating causal effects in observational studies. Implementation in R and SAS software will be covered in detail, which will permit participants to integrate these useful data science techniques into their professional activities and projects. Learning how to produce simple yet powerful graphics to assess the propensity score model adequacy, check covariate balance and display the results, will undoubtedly benefit every participant. By leveraging their enhanced set of skills, individuals across industries will be adequately positioned to become more effective communicators in their interactions with customers and clients. Continued professional development is key to one’s career growth and can enhance the overall analytical capabilities within their respective organizations and institutions.
Sat, Feb 16
2:00 PM - 4:00 PM
Commerce
Instructor(s): Jim Harner, West Virginia University
This tutorial covers the data science process using R as a programming language and Spark as a big-data platform. Powerful workflows are developed for data extraction, data tidying and transformation, data modeling, and data visualization.
During the course R-based examples show how data is transported from data sources into the Hadoop Distributed File System (HDFS), into relational databases, and directly into Spark's real-time compute engine. Workflows using `dplyr' verbs are used for data manipulation within R, within relational databases (PostgreSQL), and within Spark using `sparklyr'. These data-based workflows extend into machine learning algorithms, model evaluation, and data visualization.
The machine learning algorithms include supervised techniques such as linear regression, logistic regression, gradient-boosted trees, and random forests. Feature selection is done primarily by regularization and models are evaluated using various metrics. Unsupervised techniques include k-means clustering and dimension reduction.
Big-data architectures are discussed including the Docker containers used for building the short-course infrastructure called RSpark.
Outline & Objectives
Modules:
1. Fundamentals: Linux; RSpark; RStudio; Git; Data Science Process [20 min]
2. Data Sources: Text; JSON; PostgreSQL; Web [20 min]
3. Data Transformation: Data Cleaning; `tidyr'; `dplyr' [20 min]
4. Hadoop: HDFS as a Persistent Data Store for Spark [30 min]
5. `sparklyr': Spark DataFrames; `dplyr' Interface [30 min]
6. Supervised Learning: Regression and Classification Workflows with Spark [60 min]
7. Unsupervised Learning: Dimension Reduction and Clustering with Spark [30 min]
The first three modules will not be covered in detail since the focus is the last four. However, the content in modules 1--3 contain critical information for understanding the latter modules.
The objectives of this course are to:
• extract static and streaming data from data sources,
• transform data into structured form,
• load data into relational and persistent, distributed data stores,
• build models using machine learning algorithms,
• validate and test models based on evaluation metrics,
• visualize big data and model metrics.
About the Instructor
E. James Harner is Professor Emeritus of Statistics and Adjunct Professor of Business Data Analytics at West Virginia University. He was the Chair of the Department of Statistics for 17 years and the Director of the Cancer Center Bioinformatics Core for 15 years. Currently, he is the Chairman of the Interface Foundation of North America which has partnered with the American Statistical Association to organize the annual Symposium on Data Science and Statistics (SDSS). The areas of his technical and research expertise include: bioinformatics, high-dimensional modeling, high-performance computing, streaming and big data modeling, and statistical machine learning.
This course is based on a two-day workshop developed for the National Institute of Statistical Sciences (NISS): https://www.niss.org. The two-day version has been successfully taught three times (at ASA headquarters and at UC Riverside in September, 2017 and at the U. of Toronto in April, 2018). A one-day version of this course will be taught at the Symposium on Data Science and Statistics in May, 2018 and at the Joint Statistical Meeting in August/September, 2018.
Relevance to Conference Goals
Unlike many data science short courses, RSpark provides big-data platforms, i.e., R, Hadoop and Spark and their ecosystems. This is difficult for most instructors since the infrastructure is difficult to build. Thus, attendees will get a realistic taste of what data science really is.
The full data science process is taught, but the focus is on machine learning and the underlying R code. What is taught is a realistic representation of what is done in practice.
Communication of results is done through reproducible reports and data visualizations, which are often the endpoints of pipelines in R and Spark. Collaboration is prinarily done using Git and GitHub although code sharing within RStudio is also discussed. Data science in practice is almost always a team effort and parts of this collaboration are taught.
This course offers a unique opportunity for professional development since a real data science platform is used. It is possible to scale RSpark using container orchestration, but the containers used within this course are essentially indistinguishable from a production environment.
Sat, Feb 16
2:00 PM - 4:00 PM
Canal
Instructor(s): Amy Yang, Uptake
It is time to take the next step and start wrapping all your utility functions, that are scattered across numerous .R files, into R packages to help with code organization, distribution, and consistent documentation.
In this hands-on tutorial, I will introduce step-by-step how to build your very own R package. If you've used R, you've almost certainly used a package - but did you know that building your own package is actually not hard at all? If you have written bits of useful code you want to keep and return to, you might want a package.
After this session, participants will have the skills to start a package and document their functions, and resources to use for next steps like vignettes and unit testing. During the tutorial, participants can follow along using provided scripts.
Outline & Objectives
This hands-on tutorial includes the following sections:
1. Setup R and install required packages
2. Create the framework for your package
3. Add functions to the package
4. External dependencies
5. Documentation
6. Install and use your package
7. (Bonus) Distribute your package on GitHub
After this session, participants will have the skills to start a package and document their functions, and resources to use for next steps like vignettes and unit testing. During the tutorial, participants can follow along using provided scripts.
About the Instructor
Amy Yang is a Sr. Data Scientist at Uptake where she conducts industrial analytics and build prediction models to major industries and help them increase productivity, security, safety and reliability.She began using R for simulation and statistical analysis during her study at the University of Pennsylvania where she received her MS degree in Biostatistics. She also teaches R programming and statistical courses for graduate students. You can find her on twitter @ayanalytics
Outside of work, Amy co-organizes the Chicago RLadies meetup group where she helps promoting R, inviting women speakers from different data science fields to give talks. Her goal is to create a friendly network among women who use R!
Amy also mentors PhD and master students on their quantitative dissertations. She enjoys the teaching aspect of doing Data Science.
Relevance to Conference Goals
The tutorial is relevant and touches these areas of the conference theme.
1. Communication and Collaboration
No more emailing .R scripts! An R package gives an easy way to distribute code to others. Especially if you put it on GitHub.
2. Consistent documentation
I can barely remember what half of my functions do let alone the inputs and outputs. An R package provides a great consistent documentation structure and actually encourages you to document your functions.
3. Code Organization and reproducibility
Are you trying to figure out where that “function” you wrote months, weeks, or even days ago? Often times, people in statistics end up just re-writing it because it is faster than searching all the .R files. An R package would help in organizing where your functions go.
Sat, Feb 16
2:00 PM - 4:00 PM
Magazine
T4 -
Simulation Design and Reporting with Applications to Drug Development
Fill out evaluation
Tutorial
Instructor(s): Greg Cicconetti, AbbVie; Inna Perevozskaya, GlaxoSmithKline
Simulation methods have become an increasingly important tool in the search for more efficient clinical trial designs and/or statistical analysis procedures. During our short course we will provide a road map to developing and executing a successful simulation plan and communicating these results with a broader team. We will begin with a survey of problems one might encounter during the design, monitoring and analysis stages of a clinical trial for which a simulation study may provide some insight. We continue with an introduction to standard methods for generating random data. This discussion will include methods to mimic real-world data that do not adhere to standard statistical distributions, methods to introduce correlation among endpoints, parametric and non-parametric bootstrapping techniques, and the use of historic data to simulate future data. Having established this foundation, we return to some of our motivating problems and discuss their simulation-based solutions in greater depth. Though extensive R code will be provided to supplement this tutorial, our emphasis will be on the important concepts and principles of good simulation design and reporting.
Outline & Objectives
Tentative Course Outline: a subset of topics may be replaced with more contemporary materials
• Welcome and introduction
• Some motivation for simulation
• Modeling randomness
• Enrollment modeling
• Simulating correlated data
• An application using simulated correlated endpoints
• Leveraging historic data to aide in simulation
• Case study: Robustness of efficacy to early withdrawers in an outcomes study
• Case Study: Recurrent events
• Simulation Size – How large is large?
• Closing remarks
Course Objectives:
• Provide an introduction to statistical simulation
• Contrast theory and iterative problem solving
• Demonstrate simulation concepts via examples
• Simulation planning
• Communicating & drawing inferences from simulation
• Focus is not on coding and syntax or deep theory
About the Instructor
Greg Cicconetti, Ph.D., Statistical Innovations, Data and Statistical Sciences, AbbVie. Greg began his career as an assistant professor of statistics at Muhlenberg College before joining the pharmaceutical industry in 2005. In his roles at GlaxoSmithKline and AbbVie, Greg has gained extensive experience in survival and longitudinal trials, Bayesian methodology, and statistical learning. He has used simulation to guide teams regarding trial design, monitoring, and sensitivity analyses. In his current position Greg assists study teams in determining decision criteria to be used at interim analyses, effectively marrying simulation and visualization to build team consensus. Portions of the planned course material were delivered at the 2014 Deming Conference and also used in the graduate level Advanced Statistical Computing course at Drexel University taught by Greg in 2015. Greg is also a member of the DIA Scientific Working Group on Adaptive Designs and has participated in the development of a manuscript, along with other industry experts, advocating best practices in simulation reporting.
Relevance to Conference Goals
While this course is intended to be an introduction to simulation design and reporting, the attendee will be exposed to new statistical methodologies currently being employed to support on-going trials. Our discussion on simulation reporting will emphasize the importance of clearly articulating one's simulation design and summarizing pertinent simulation output in a way that facilitates collaboration with multiple stakeholders. Although we will use drug development and clinical trial design as a backdrop for explaining important simulation concepts, the core ideas presented should readily translate to those in other fields.
Sat, Feb 16
4:15 PM - 5:30 PM
Jackson
GS2 - Closing General Session
General Session
Chair(s): Kim Love, K. R. Love Quantitative Consulting and Collaboration
The Closing Session is an opportunity for you to interact with the CSP Steering Committee in an open discussion about how the conference went and how it could be improved in future years. CSPSC vice chair, Kim Love, will lead a panel of committee members as they summarize their conference experience. The audience will then be invited to ask questions and provide feedback. The committee highly values suggestions for improvements gathered during this time. The best student poster will also be awarded during the Closing Session, and each attendee will have an opportunity to win a door prize.