Online Program

Return to main conference page
Program-at-a-Glance
Keynote Address | Concurrent Sessions | Poster Sessions
Short Courses (full day) | Short Courses (half day) | Tutorials | Practical Computing Demonstrations | Closing General Session with Refreshments

Last Name:

Abstract Keyword:

Title:

     

Viewing Short Course (half day)s onlyView Full Program
     
Thursday, February 15
SC3 Collaboration Essentials for Practicing Statisticians and Data Scientists
Thu, Feb 15, 8:00 AM - 12:00 PM
Salon C
Instructor(s): Heather Smith, Cal Poly; Eric Vance, LISA--University of Colorado Boulder

Download Handouts
Statisticians and data scientists positively impact many people, organizations, and governments through the careful collection, analysis, and interpretation of data to solve problems and make decisions. To maximize their impact, statisticians and data scientists must effectively collaborate with a variety of domain experts who originate the data or the problems to be solved. In this short course, participants will learn and practice essential skills to improve their professional communication and collaboration to increase their effectiveness on the job. Specifically, participants will learn how to establish foundational collaborative relationships with domain experts; structure effective meetings; and effectively communicate with non-statisticians. Participants will also practice their newly acquired skills and learn how to improve their proficiency in these essential collaboration skills by using role-plays and video coaching and feedback reviews outside of this short course. In sum, participants will learn and practice how to leverage their technical skills to more effectively collaborate for maximal impact inside and outside of their organizations.

Outline & Objectives

Our goal is to unlock the collaborative potential of participants (from beginning to advanced) so they can return to key roles within their organizations and achieve greater impact. Prerequisites are a desire to improve one’s personal effectiveness and openness to try new methods and ways of thinking in the practice of statistics and data science.

Outline and Objectives:
1. Learn how to build foundational collaborative relationships with clients, colleagues, and other domain experts by applying the Fundamental Law of Statistical Collaboration and the QQQ process.

2. Learn how to structure and conduct effective meetings using the POWER structure (Prepare-Open-Work-End-Reflect).

3. Analyze the opening and ending structures of a real meeting (on video) and/or a live role-play using rubrics.

4. Practice applying the POWER structure via focused role-plays with subsequent coaching and feedback.

5. Learn tips for effectively communicating with non-statisticians.

6. Practice listening, summarizing, and paraphrasing statistical and subject matter content; asking good questions; and explaining statistics to non-statisticians using the ADEPT method.

About the Instructor

For the past 9 years, Dr. Eric Vance, an Associate Professor at the University of Colorado, has been the director of LISA (Laboratory for Interdisciplinary Statistical Analysis) where he has trained 245 statisticians to move between theory and practice to collaborate with 9000+ domain experts to apply statistics and data science to answer their research or business questions. He has taught workshops and webinars on collaboration around the world, including three workshops on this topic at JSM from 2014-2016 with Heather Smith.


Heather Smith has 27 years of experience consulting with academic, industrial, service, and government clients in the United States, Europe, and Asia. She began this work as a statistical consultant at Westat, Inc. For 20 years she has been a faculty member in the Statistics Department at Cal Poly San Luis Obispo where she consults with academic and private sector researchers and teaches a wide variety of applied statistics courses including two courses she developed for undergraduate statistics majors, one in Statistical Communication and one in Statistical Consulting. She has offered over a dozen workshops, short courses, and webinars on these topics.

Relevance to Conference Goals

This short course is immediately relevant for two of the three main conference goals. If selected, this short course will teach participants how to better communicate and collaborate with their clients and customers, will provide them with skills and practice on how to have a positive impact on their organization, and will enhance their professional development.

Participants will learn best practices in statistical consulting and collaboration that will enhance their organizational impact and lead to career development and advancement. Participants will return to their jobs with new ideas, techniques, and strategies to improve their ability to communicate and collaborate effectively, resulting in a greater impact on their organizations and increasing the overall impact of statistics and data science in the world at large.

Note: this short course will not be offered at JSM in 2017 because organizers believed that it was a better fit for CSP.

Software Packages

We will not be using any software in this short course.

 
SC4 A Variety of Mixed Models: Linear, Generalized Linear, and Nonlinear
Thu, Feb 15, 8:00 AM - 12:00 PM
Salon E
Instructor(s): David A. Dickey, NC State University

Download Handouts
The MIXED procedure in SAS, for example, correctly handles linear models that have multiple sources of random effects such as random town to town, store to store, and aisle to aisle variation in sales. Associated fixed effects might be product price, color of packaging and amount spent on advertising. The talk begins with a checklist for deciding when to treat effects as random versus fixed and follows with a series of examples. When the response variable is not normal, for example with a binary or Poisson response, additional complexities arise. Models with such non normal responses are often analyzed by assuming that some transformation, or link function, of the expected value of Y results in a linear model with fixed and random effects. We are then in the generalized linear mixed model setting. It may be that a model cannot be linearized by a transformation, thus making it a nonnlinear model. If random effects are involved the model is referred to as a nonlinear mixed model. With a minimal amount of theory and an emphasis on examples, these types of models will be explained and illustrated. SAS will be used but the ideas and interpretation are software independent.

Outline & Objectives

The presence of random effects in modelling can easily go unnoticed and yet it has profound effects on inference. For that reason this course shows, through a series of descriptions and examples, how to recognize this situation and deal with it. A surprising variety of models such as split plots, unbalanced block designs, and repeated measures to name a few, fall into the linear mixed models category. The impact of correctly incorporating random effects will be illustrated with a simple example. The slightly more complex cases of non normal responses and nonlinear associations are also profoundly affected by the presence of random effects and examples of these will be included.

About the Instructor

David A. Dickey is William Neal Reynolds Distinguished Professor of Statistics at North Carolina State University. He is the co-inventor of the "Dickey-Fuller test" that is commonly discussed in time series texts and is present in many time series software packages. A Fellow of ASA, Dave has presented at all but one of the past CSP conferences and has been program chair of the Business and Economic Statistics section for JSM. At NCSU he was a founding faculty of the Institute for Advanced Analytics, is a member of the Integrated Manufacturing and Systems Engineering Institute, and the Financial Math program. He has an associate appointment in Economics at NCSU and is a member of the Academy of Outstanding Teachers. As a contract instructor for SAS Institute he has taught and helped develop many training courses, including those on time series and mixed models. He is a frequent presenter at SAS Global Forum and is an author in their Books by Users series. He has coauthored several books and written many research articles as well as advising over a dozen PhD students.

Relevance to Conference Goals

The attendees will leave with a new appreciation of modelling and insights into not only how to recognize random effects but how to deal with them. They will have concrete tools to deal with this phenomenon which is very common in practical data analysis. Underlying ideas will be explained in understandable terms and illustrated with interesting and informative examples. Users of SAS will be able to immediately apply the content to their own work and users of other software, with a quick review of syntax, should also be abel to "hit the ground running" upon return to work. I anticipate this course raising the level of analysis and insight for all attending.

Software Packages

SAS will be used exclusively but the emphasis on ideas and interpretation should be software independent.

 
SC5 Cleaning Up the Data Cleaning Process: Challenges and Solutions in R
Thu, Feb 15, 8:00 AM - 12:00 PM
Salon D
Instructor(s): Claus Thorn Ekstrøm, Biostatistics, University of Copenhagen; Anne Helby Petersen, Biostatistics, University of Copenhagen
Data cleaning and validation are the first steps in any data analysis, as the validity of the conclusions from the analysis hinges on the quality of the input data. Mistakes in the data can arise for any number of reasons, including erroneous codings, malfunctioning measurement equipment, and inconsistent data generation manuals. We present a systematic, analytical approach to data cleaning that will ensure the data cleaning process to be just as structured and well-documented as the rest of the data analysis. The primary software tool is the dataMaid R package, which implements an extensive and customisable suite of quality assessment tools that can be used to identify potential problems in a dataset. The results are summarised in an auto-generated, non-technical, stand-alone document readable by statisticians and non-statisticians alike. Thus, the course teaches practical skills that aid the dialogue between data analysts and field experts, while also providing easy documentation of reproducible data cleaning steps and data quality control.

Outline & Objectives

The course will consist of an interchange between teaching and hands-on interactive sessions, where the participants work with messy data in R, mostly using the dataMaid R-package. Thereby, we establish common grounds and a common vocabulary for understanding and describing the process of data cleaning as an analytical practice, rather than a number of ad-hoc steps. Moreover, the participants will be introduced to the possibilities of the dataMaid R-package and will learn how to use the software for producing documentable data overview reports that are relevant for their specific data cleaning needs.

If necessary, the course will split into two parallel sessions where experienced R developers are introduced to the semantics of writing dataMaid extensions, while less trained R users will focus on how dataMaid can be used interactively in the R console, so attendees of all skills levels are encouraged to join.

Participants are assumed to be R-users, but not necessarily familiar with writing R extensions.

About the Instructor

Claus Thorn Ekstrøm is professor at the section of Biostatistics, University of Copenhagen, and has taught statistics courses at bachelor, master, and graduate levels for more than 15 years. He is the creator and contributor to a number of R packages (dataMaid, MESS, MethComp, SuperRanker) and is the author of "The R Primer" book. He has previously given tutorials on Dynamic graphics in R and the role of interactive graphics in teaching, and won the C. Oswald George prize for his article "Teaching 'Instant Experience' with Graphical Model Validation Techniques" in 2014.

Anne Helby Petersen holds a MSc in statistics and is the main author of the dataMaid R-package and the companion scientific manuscript. She has worked as a TA on several courses in mathematics and statistics.

Relevance to Conference Goals

The short course teaches the participants new practical tools that will aid their daily work with data cleaning and data quality assessment. More specifically, the participants will be able to use the standard dataMaid solution and to make simple, customised extensions, thereby targeting a wide variety of data cleaning challenges. As the dataMaid software focuses on auto-generated reports that are readable by non-R users, these tools will also help the participants in their communication with collaborators, clients and field experts, who might not be familiar with R or statistics in general. Moreover, by discussing data cleaning, not as a nuisance, but as a real, scientific practice, the participants will find themselves to be better equipped for planning and time-framing data cleaning in the future.

Software Packages

In the course, we will only use open-source software within the domains of the statistical programming language R.
More specifically, we will use the R-package dataMaid, which is available through CRAN. The packages validate, editrules, and deducorrect will also be discussed.

 
SC6 Effective Presentation for Statisticians and Data Scientists: Success=(PD)^2
Thu, Feb 15, 1:30 PM - 5:30 PM
Salon C
Instructor(s): Jennifer H. Van Mullekom, Virginia Tech

Download Handouts
Statisticians must be able to effectively convey their ideas to clients, collaborators, and decision-makers. Presenting in the modern world is even more daunting when speakers have the opportunity to employ slideware, videos, and live demos. Unfortunately, university coursework and professional development programs are often not targeted towards sharpening these skills. This short course, developed and taught by statisticians, will provide an opportunity to learn how to employ different methods and tools in the phases of the framework taught. The material covered in the course is geared toward data-based presentations and is based on the works of Garr Reynolds and Michael Alley, among others. The course will emphasize the importance of stepping away from the computer to Prepare an effective message aimed at your core point guided with a series of questions and tips. The Design phase emphasizes the importance of structure, streamlining, and good graphic design accompanied by a series of checklists. Of course, “Practice makes perfect” so we cannot skip this step. Finally, engaging the audience and effectively using the room and equipment is covered in the Deliver phase.

Outline & Objectives

At the end of this course, participants will have an arsenal of techniques, methods, tips, and tricks to prepare, design, practice and deliver effective presentations to decision makers and research audiences.
I. Prepare
a. Questions you must answer before your presentation
b. Steps for creating the story of your facts
c. Tips and tricks
d. Deep dive into analogies, diagrams and examples for statistics
II. Design
a. Simplicity
b. Structure
c. Sight
d. Streamline
e. Data/Statistics Slide Makover Exercises
III. Practice
a. How to practice
b. How to use practice to improve your delivery
IV. Deliver
a. How you look, sound, and move
b. Overcoming nerves
c. Give a 5 minute presentation using the techniques you have learned during the day
V. Special Topics
a. Webinars & Teleconferences
b. Global Audiences
c. Non-native English Speakers
d. Dealing with Difficult People
e. Casual Meeting Updates and Report Outs

About the Instructor

Jennifer Van Mullekom is currently an Associate Professor of Statistical Practice at Virginia Tech where she leads the Laboratory for Interdisciplinary Statistical Analysis (LISA). Here she provides statistical collaboration for on-campus research and her duties include securing funding, setting direction, mentoring and teaching students, and providing technical statistical support to LISA. Prior to this role, she served as a Senior Consulting Statistician with Dupont. She has been actively involved in the American Statistical Association's Section on Physical and Engineering Sciences (SPES) since 1998 and has held various positions in the organization. Jen has participated in numerous conference committees with ASA including the FTC and the CSP. She has also co-developed the American Statistical Association’s “Effective Presentations for Statisticians” Course. Her statistical areas of interest include equivalence testing, regression modeling, response surface designs, mixed models, and statistical engineering.

Relevance to Conference Goals

This short course embodies the topic of communicating complicated analyses in simple ways for non-statisticians/decision makers. Effective communication then encourages collaboration and consequently leads to career advances.

Note: This course was developed in conjunction with ASA's career success factor's task force several years ago. It is set up to be a full 8 hours. Portions of it could be set up as either a half day course or a tutorial but the full content could not be condensed to two or four hours. It also works very well as two half day sessions which allows participants time to work on their presentations for the final session.

Software Packages

PowerPoint, Keynote, Prezi

 
SC7 Statistical Learning Methods in R
Thu, Feb 15, 1:30 PM - 5:30 PM
Salon E
Instructor(s): Kelly Sue McConville, Swarthmore College

Download Handouts
Applied statisticians are often confronted with difficult modeling problems where standard regression approaches are not appropriate. For example, it may be that the number of possible predictors is large relative to the sample size or that the relationship between the variables is non-linear. This course will cover several statistical learning techniques which are designed to handle these difficult modeling problems. In particular, we will study penalized regression techniques (lasso, ridge, elasticnet), non-parametric regression (regression and smoothing splines), and classification methods (support vector machines, trees). Using data from the Bureau of Labor Statistics, participants will learn how to fit these models in R. R Markdown files with the relevant code will be provided so that participants can actively follow along with the demonstrations.

Outline & Objectives

The three main topics of the course are:

1. Penalized parametric regression with the lasso, ridge and elasticnet.

2. Penalized nonparametric regression with regression and smoothing splines.

3. Classification with logistic regression and support vector machines.

By the end of the course, participants should

• Have a basic understanding of several statistical learning methods and their applicability.

• Be able to build the models in R.

• Be able to compute measures that allow for comparisons between methods.

About the Instructor

Dr. McConville is an Assistant Professor of Statistics at Swarthmore College. She has a PhD in Statistics from Colorado State University. Her research focuses on the adaptation of statistical learning techniques to data from a complex sample design. She collaborates with the US Forest Service Forest Inventory and Analysis Program and the US Bureau of Labor Statistics. She teaches statistical learning and R in many of her courses at Swarthmore.

Relevance to Conference Goals

Through practical examples, the course will expose participants to popular statistical learning methods. Learning these powerful predictive techniques will expand their modeling toolbox.

Software Packages

R and RStudio will be used throughout the course. Participants are strongly encouraged to bring computers with R and RStudio installed beforehand.

 
SC8 NISS Shortcourse: A Survey of Modern Data Science
Thu, Feb 15, 1:30 PM - 5:30 PM
Salon D
Instructor(s): David Banks, Dept. of Statistical Science, Duke University

Download Handouts
Modern data science is driven by applications, and these often entail Big Data and machine learning perspectives. This short course reviews key ideas and methods in nonparametric regression (starting with cross-validation and light bootstrap asymptotics, then moving on to the additive model, the generalized additive model, and neural networks. It also covers variable selection, with the Lasso and the Median Model, and describes the p >> n problem in the context of contributions by Candes and Tao, Donoho and Tanner, and Wainwright. The course next treats classification, with emphasis upon Random Forests, boosting, and ensemble strategies such as bagging, stacking and boosting.

Outline & Objectives

The course intends to convey the intuition and heuristics that underlay the evolution of data mining, machine learning, and data science from the 1990s to the present day. The target audience is MS- level practitioners who have some comfort with regression analysis.

About the Instructor

David Banks is a professor at Duke University who has taught this material in a graduate course on machine learning on multiple occasions. In 2017, he taught this short course at
he Kansas State University's Agricultural Statistics conference.

Relevance to Conference Goals

This short course aligns with the CSP's Theme 3: Data Science and Big Data. It will introduce people to a toolkit of methodologies, with instruction and guidance on when and why to use these tools, and what issues may arise. Attendees will learn statistical methods that should help them to advance in their analytical careers.

Software Packages

No specific software will be taught. Most of the methods discussed have implementations in R, Matlab, and (sometimes) SAS.