Online Program

Keynote Presentation | Concurrent Sessions | Poster Sessions
Short Courses (full day) | Short Courses (half day) | Tutorials | Practical Computing Expos | Closing General Session with Refreshments

Last Name:

Abstract Keyword:

Title:

     

Viewing Short Course (full day)s onlyView Full Program
     
Thursday, February 20
SC1 Enhancing Big Data Projects Through Statistical Engineering
Thu, Feb 20, 8:00 AM - 5:00 PM
Bayshore V
Instructor(s): Richard De Veaux, Williams College; Roger Wesley Hoerl, Union College; Ron Snee, Snee Associates

Download Handouts
Massive data sets, or Big Data, have become more common recently, due to improved technology for data acquisition, storage, and processing of data. New tools have been developed to analyze such data, including classification and regression trees (CART), neural nets, and methods based on bootstrapping. These tools make high-powered statistical methods available to not only professional statisticians, but also to casual users. As with any tool, the results to be expected are proportional to the knowledge and skill of the user, as well as the quality of the data. Unfortunately, much of the professional literature may give casual users the impression that if one has a powerful enough algorithm and a lot of data, good models and good results are guaranteed at the push of a button.

Conversely, if one applies sound principles of statistical engineering to the Big Data problem, several potential pitfalls become obvious. We consider the consequences of four major issues: 1) lack of a disciplined approach to modeling, 2) use of “one shot studies” versus sequential approaches, 3) assuming all data are high-quality data, and 4) ignoring subject matter knowledge.

Outline & Objectives

Outline
Introduction
Uniqueness of Big Data Projects
-Relative to traditional statistics projects
New Methods for Big Data
What Could Go Wrong?
-Big blunders with Big Data
-Sequential approaches versus one-shot studies
-Integration of analytics with sound subject matter theory
-Data quality
How Statistical Engineering Can Help
-Brief review of statistical engineering
-Theory and key principles
-Building blocks of statistical engineering (major phases)
Application of Statistical Engineering to Big Data
-Discussion
-Breakout exercises
Recap and Summary
-What have we learned?


Objectives:
• Develop a clear understanding of Big Data projects; what is new and unique versus what is not.
• Develop the skills and confidence needed to attack Big Data projects in a logical, sequential manner, applying the phases and principles of statistical engineering
• Clarify how easily significant blunders that can occur in Big Data projects if fundamentals are ignored.
• Develop a solid understanding of the theory and practice of statistical engineering.

About the Instructor

Ron Snee, Dick De Veaux, and Roger Hoerl each have significant track records within ASA, and the profession in general. De Veaux has given numerous presentations on data mining, and published in that area extensively. Snee and Hoerl are primarily responsible for the development of statistical engineering as a discipline, and again, have published extensively on that topic. By partnering on this workshop, they bring a unique set of skills and experiences to the issue of enhancing Big Data projects through application of statistical engineering principles. In an effort to keep the level of documentation reasonable, we have not included vitas; these can be forwarded if needed.

Relevance to Conference Goals

We feel that this workshop relates quite well to all three of the major conference themes. Obviously, it relates directly to Big Data analytics, in that participants will be learning tangible skills that they can apply to their current or future Big Data projects. In addition, application of statistical engineering principles – especially on large, complex, unstructured problems - enhances the leadership skills of statisticians, and thereby provides a vehicle for career advancement. These principles can also be applied to enhance more traditional modeling and analysis projects, which is the third major theme.

 
SC2 Design and Analysis of Experiments Using Generalized Linear Mixed Models
Thu, Feb 20, 8:00 AM - 5:00 PM
Bayshore I
Instructor(s): Elizabeth Claassen, University of Nebraska; Walt Stroup, University of Nebraska

Download Handouts
Course presents applications of generalized linear mixed models (GLMMs). Focus is especially on GLMMs for design and analysis of experiments with non-normal data. Material is at an applied level, accessible to those familiar with linear models.

Participants will learn that GLMMs are an encompassing family and understand the differences and similarities in estimation and inference within the family. We discuss issues in working with correlated, non-normal data such as overdispersion, marginal and conditional models, and model diagnostics. We present GLMMs for common non-normal response variables—count, binomial and multinomial, time-to-event, continuous proportion—in conjunction with common designs—blocked, split-plots, repeated measures. Numerous examples will be presented.

The afternoon continues with GLMM applications and associated issues, including comparison of estimation methods, computation of power and sample size, model selection, and inferential tasks with and without adjustments.

Numerous examples will be used to illustrate all topics. Examples use tools in SAS/STAT and R, but the principles should be applicable to any GLMM-capable software.

Outline & Objectives

1. From Linear Model to GLMM
A. General Setting
B. Linear Models and Linear Mixed Model (LM, LMM)
C. Generalized Linear Model (GLM)
D. Generalized Linear Mixed Model (GLMM)
2. Marginal or Conditional Models
A. Defining a Model from Design Properties
B. Overdispersion and Other Design-Induced Issues
C. G- and R-side Random Effects
D. GEE versus GLMM
E. Distributional Implications
3. Estimation
A. (Restricted) Maximum Likelihood
B. Quasi-Likelihood/GEE
C. Pseudo-Likelihood
D. Laplace and Quadrature
E. Model-Based and Empirical (“Sandwich”) Estimators
4. Rates and Proportions
A. Distributions
B. Binomial Proportions
C. Binary Data
D. Multinomial
E. Beta – Continuous Proportions
5. Counts
A. Distributions
B. Poisson or Negative Binomial?
C. Modeling with Offsets
D. Zero-inflated Models
6. Within-Subject Correlation
A. Repeated Measures Background
B. Review of Methods for Normally-Distributed Data
C. Extension to Non-Normal Data – Similarities and Differences
D. Spatial Variation
7. Power, Precision and Sample Size
A. Background
B. Power & Sample Size for Continuous, Count, and Binomial Data
C. Comparing Competing Designs using GLMM tools
D. Longitudinal & Spatial Data

About the Instructor

Walt is Professor in the Department of Statistics at the University of Nebraska-Lincoln. He teaches statistical modeling and design of experiments. His research concerns mixed model applications collaborating with agricultural, natural resource, medical, pharmaceutical science, education, and the behavioral science. He participated in a multi-state mixed model project that motivated the development of SAS PROC MIXED. He co-authored textbooks on SAS for linear models, SAS for mixed models, and GLMMs for Plant and Natural Resource Sciences. He authored Generalized Linear Mixed Models: Modern Concepts, Methods and Applications (2013).

Elizabeth is a doctoral candidate completing her dissertation concerning bias correction for variance component estimation in GLMMs. With Walt, she has co-taught UNL’s graduate-level design of experiments course and assisted with the advanced modeling course. She has been primary instructor of introductory undergraduate statistics courses and has consulted with researchers from a variety of disciplines. She worked at JMP testing new applications. She presented preliminary results of her research at JSM 2013. Elizabeth’s expects to graduate May, 2014.

Relevance to Conference Goals

Modeling in the 21st century has grown beyond the basic ANOVA and linear model as most people learned them. This is especially true when the data cannot reasonably be assumed to be normally distributed. When data aren’t normal, why should your statistical analysis be? With the continued improvements in computing power and a wide variety of software options, the ability to model complex experiments and quasi-experiments with a variety of response variable distributions not limited to Gaussian is a required skill for data analysts. This course will introduce (or provide a refresher to) generalized linear mixed models as an overarching family that contains all of its predecessors to participants. The course will familiarize participants with the essential thought-processes required to use GLMMs effectively and with a wide variety of practical applications.

 
SC3 Elegant R Graphics with ggplot2
Thu, Feb 20, 8:00 AM - 5:00 PM
Palma Ceia III
Instructor(s): Isabella R. Ghement, Ghement Statistical Consulting Company Ltd.

Download Handouts
R comes equipped with several packages for producing elegant graphics, and ggplot2 is one of the most powerful and versatile of these packages. This one-day course will provide participants with an in-depth introduction to ggplot2 in the context of graphics production for exploratory and confirmatory data analyses. Participants will learn how to use ggplot2 to produce, customize, and export publication-quality graphics that facilitate the communication of data-driven insights. In particular, participants will gain an understanding of the ggplot2 philosophy, syntax, and capabilities; learn how to create standard and advanced statistical graphs; and become skilled at customizing graphs through the addition of labels, titles, symbols, colors, legends, scales, annotations, layers, and themes. Participants also will learn how to combine the presentation of numerical and visual data summaries in the same graph, save ggplot2 output in a variety of standard graphical formats, and embed this output in automated reports and presentations. This hands-on course will offer participants the opportunity to practice the use of ggplot2 in real time. Participants are required to have basic knowledge of R and bring their laptops pre-installed with R and ggplot2.

Outline & Objectives

The outline for this course is as follows:

1) Overview of R Studio (e.g., installation, menus, workflow);
2) Basics of ggplot2 (e.g., philosophy, capabilities, syntax);
3) Producing standard graphics with ggplot2 (e.g., histograms, boxplots, scatter plots, time series plots, bar charts);
4) Producing advanced graphics with ggplot2 (e.g., scatterplot matrices, conditional plots obtained through faceting);
5) Customizing graphics constructed with ggplot2 (e.g., labels, titles, symbols, colours, legends, scales, annotations, layers and themes);
6) Saving graphics constructed with ggplot2 in a variety of formats (e.g., pdf, jpeg, png);
7) Embedding ggplot2 graphics in automated reports and presentations.

The objective of this course is to teach participants how to produce, customize and export publication-quality graphics using R Studio and ggplot2. By exploiting ggplot2's power and versatility, participants will be able to create elegant, complex, multi-layered statistical graphics which will facilitate the communication of data-driven insights.

About the Instructor

Dr. Isabella R. Ghement is an independent statistical consultant and trainer based in Vancouver, Canada, British Columbia. She has extensive R training experience and presented the course "A Crash R Course on Statistical Graphics" at the ASA Conference on Applied Statistical Practice held in New Orleans in February 2013. Since 2007, Dr. Ghement has taught on a yearly basis an advanced regression course using R to graduate students in the Sauder School of Business at the University of British Columbia. Dr. Ghement’s statistical consulting clients include federal and provincial government agencies, contract research organizations and academic researchers. Her research expertise covers areas such as partially linear regression modeling, robust regression modeling and mixed treatment comparisons.

Relevance to Conference Goals

This course falls under the umbrella of the following conference theme: Theme 4 (Software and Graphics). As such, the course will help participants to master and employ modern data visualization methods using the R Studio software and the R graphical package ggplot2. The course will also encourage participants to adhere to good principles of reproducible research when producing automated reports and presentations which include graphical output.