Online Program

Program-at-a-Glance
Keynote Address | Concurrent Sessions | Poster Sessions
Short Courses (full day) | Short Courses (half day) | Tutorials | Practical Computing Demonstrations | Closing General Session with Refreshments

Last Name:

Abstract Keyword:

Title:

     

Viewing Tutorials onlyView Full Program
     
Saturday, February 21
T1 A Case Study in Big Data Analytics
Fill out evaluation
Sat, Feb 21, 2:00 PM - 4:00 PM
Maurepas
Instructor(s): Patrick Hall, SAS Enterprise Miner

Download Handouts
So what exactly do you do when faced with a huge data set from which you are to derive insights? This happens in banks, insurance companies, government agencies, manufacturing centers, and other institutions all the time. This tutorial illustrates best practices for mining large data sets in the context of a case study. Participants will learn real-world techniques to explore and preprocess data; to select, extract, and engineer the most predictive features; to build the best predictive model for the job at hand; and to leverage predictive analytics to make decisions for their organization. Instructors also will point out common pitfalls and trade-offs inherent to contemporary Big Data approaches. SAS Enterprise Miner will be used for the analyses, but the focus will be on the methods and not the software. Participants will have access to the example data for further study.

Outline & Objectives

The objectives of this tutorial are for the participants to become familiar with general terminology, best practices, and practical, contemporary approaches for working with large data sets including the following:

• Data exploration

• Data preparation


• Data reduction techniques such as sampling, feature selection, feature extraction, and feature engineering

• Statistical and machine learning approaches for predictive analytics

• Scoring large data sets with predictive models

The tutorial will address the subjects outlined below:

1. Understanding the primary goal of a project in terms of inference or prediction: GLM and decision trees vs. machine learning algorithms.

2. Buzzwords: What are "Analytics", "Big Data", and "Machine Learning"?

3. Big Data Best Practices

4. Data Preparation and Exploration

5. Data Reduction

6. Prediction and Scoring

About the Instructor

Patrick designs new data mining and machine learning technologies for SAS. He is a Cloudera certified data scientist and a certified SAS Enterprise Miner predictive modeler. Patrick has two patent applications for his recent work in unsupervised learning. He studied computational chemistry before receiving an MS degree in analytics from the Institute for Advanced Analytics at North Carolina State University.

Relevance to Conference Goals

This tutorial will teach practical techniques that are broadly applicable for analysts, data scientists, machine learning engineers, and statisticians who mine large data sets in academia or industry.

 
T2 An Introductory Tutorial on Mixed Models
Fill out evaluation
Sat, Feb 21, 2:00 PM - 4:00 PM
Napoleon C1
Instructor(s): Funda Gunes, SAS Institute Inc.

Download Handouts
Mixed model analysis is one of the cornerstones of modern statistics. It extends the general linear model for independent and equivariant data by allowing a more flexible covariance for the error term. Using mixed models, you can fit models to a variety of data that follow the normal distribution, including repeated measurements and data from a randomized block design. This tutorial introduces the basics of mixed model methodology and illustrates the analysis of linear mixed models in typical applications, with numerous examples using the MIXED procedure in SAS/STAT software. This tutorial also includes an overview of other mixed modeling procedures in SAS, giving a brief introduction to analyzing generalized linear models by using the GLIMMIX procedure and discussing the scenarios in which you would use the nonlinear mixed models and NLMIXED procedure. Prerequisites are a working knowledge of the general linear model and basic matrix algebra.

Outline & Objectives

Outline & Objectives:
Overview
• Designed experiments
• Fixed effects versus random effects
Linear mixed models – MIXED procedure
• Randomized block design
• Nested mixed models
• Repeated measures
• High performance mixed models analysis
Generalized Linear Mixed Models – GLIMMIX procedure
Nonlinear Mixed Models – NLMIXED procedure

About the Instructor

Funda Gunes is a research statistician in the statistical applications department at SAS. She completed her PhD in statistics from North Carolina State University with a focus on model selection methods. As a research statistician in statistical R&D at SAS, she often gives expository talks and two-hour tutorials on a variety of topics, including mixed models, model selection, and Bayesian statistics. These presentations emphasize basic concepts and introduce applied statisticians to new methodology with relevant examples.

Relevance to Conference Goals

I attended CSP twice in the last few years. I especially enjoyed meeting statistical practitioners and learning how they use statistics in their work. Based on that experience, I think the CSP audience is a perfect fit for the content and level of this proposed tutorial, and I would love to participate.

 
T3 Speak & Connect: Harnessing PowerPoint
Fill out evaluation
Sat, Feb 21, 2:00 PM - 4:00 PM
Napoleon D3
Instructor(s): Andrew Causley, Speak & Connect

Download Handouts
Data-heavy presentations can overload an audience with information quickly, causing them to tune out. Learn how to create and deliver PowerPoint presentations that are interesting, effective, and memorable! It’s a fresh approach, one that combines information with effective visuals and personal engagement to connect with an audience in a credible and captivating manner. If you can answer YES to any of the following questions, you should attend this tutorial. Have some of your slides been loaded with text, bullet points, or complex data? Have there been times when you’ve read off your slides? Has your audience ever looked bored, inattentive, or asleep? Learn how to share data and information and create better decks.

Outline & Objectives

Objective:
Create and deliver PowerPoint presentations that effectively educate your audience by keeping them engaged and motivated.

Outline:
• The Do’s and Don’ts of slide presentation software (PowerPoint)
• Verbal vs Visual communication
• Slide Types
• Slide Test
• Interacting with the audience and the technology

Outcomes
• Pin-point the message behind the information
• Transform data heavy slides into effective visuals
• When to use (and not use) PowerPoint
• How to create slides that are clear, crisp, and high-impact
• A sure-fire method to assess the effectiveness of your slides
• Ideas to Open with impact
• How to properly utilize support materials
• Structuring modular presentations
• Effective ways to Close

About the Instructor

A veteran broadcast professional, Debra Stamp, is President of Debra Stamp Productions, and Principle Trainer for Speak & Connect, a growing business communication enterprise featuring communication skills coaching, instruction on creating truly effective visuals, and guidance in developing and delivering strong, memorable messages.

Debra is a professional voice talent and continues to work in radio, television, and business using her communication skills to inform, train, and entertain listeners and viewers around the world.

As a coach and trainer, she shares her techniques and years of experience with enthusiasm and energy. Her warm personal teaching style creates a learning atmosphere that’s friendly, interactive, and fun.

Andrew Causley is the owner of Ballistic Fish Studios and Lead Trainer for Speak & Connect, a growing business communication enterprise featuring communication skills coaching, instruction on creating truly effective visuals, and guidance in developing and delivering strong, memorable messages.

Creating and capturing compelling visuals is what he does best, and he has the technical know-how to get the most out of presentation software.

Relevance to Conference Goals

The key to successfully using PowerPoint lies in recognizing that it is a Visual Aid, there to support the presenter.

Simplified slides, focused messages, and engaged audiences are just a few of the benefits experienced when coming out from behind the data.

Once participants experience this tutorial, they will never look at PowerPoint in the same way again!

 
T4 Tutorial on Parallel Programming in R
Fill out evaluation
Sat, Feb 21, 2:00 PM - 4:00 PM
Napoleon C2
Instructor(s): Miranda Fix, Colorado State University; Josh Hewitt, Colorado State University; Henry Randall Scharf, Colorado State University

Download Handouts
This tutorial introduces participants to high-performance computing in R for analyzing research data and developing practical analytics. R is a free, open source programming language that concisely suppor ts a wide range of statistical computing and machine learning needs. Modern data sets are large and computational procedures can be intense, which may become prohibitive to practical data analysis projects. This tutorial introduces participants to workflows and packages that let practitioners use R to take advantage of the power of modern computing resources like multicore architectures and cloud technologies. Applications and examples include demonstrating parallel forms of popular classic and machine learning methods, using bootstrapping and cross validation to estimate uncertainty and accuracy, simulating data to analyze “what if” scenarios, and discussing related topics. Demonstrations will be presented with R. Attendees are encouraged to bring laptops with R installed so they may follow along and experiment with these tools.

Outline & Objectives

This tutorial's goal is to introduce participants to parallel computing ideas and give them tools they can use to scale their analyses up so they can handle large datasets and problems in their organization. The tutorial will advance through several sections that:

- Introduce parallel computing and identify common “parallelizable” tasks.
- Explain and demonstrate statistical procedures that excel with parallelization.
- Use several parallel programming R packages.
- Show integrations between R and the Hadoop/MapReduce cloud computing framework.

Pre-requisites: Familiarity with data analysis methods and general-purpose R programming.

About the Instructor

Henry Scharf is a PhD student at Colorado State University with a background in
both teaching and computational statistics. He received his Masters in Education from the University of Arizona, and has worked as an instructor at CSU. He currently works in conjunction with the National Renewable Energy Laboratory on questions surrounding prioritized compression of massive datasets sensitive to specific secondary analysis.

Josh Hewitt is a MS/PhD student at Colorado State University with strong interests in teaching and statistical theory and computing. He holds a Masters degree in Applied Mathematics and Statistics from The Johns Hopkins University and has worked on big data and analytic development projects with Booz Allen Hamilton for the United States Government.

Miranda Fix is a MS/PhD student at Colorado State University with experience in teaching and statistical consulting. She earned her Masters degree in Quantitative Ecology from the University of Washington, where she served as a teaching assistant and led R tutorials for several courses. She is currently working with the National Center for Atmospheric Research on analyzing large climate datasets.

Relevance to Conference Goals

This tutorial relates to Theme 3: Big Data Prediction and Analytics, and Theme 4: Software, Programming, and Graphics. The tutorial introduces participants to key ideas and examples in parallel statistical computing that enable practical Big Data and Analytic projects. The tutorial simultaneously exposes attendees to R packages they can use to analyze data and develop analytics in their own organizations.