Online Program

Program-at-a-Glance
Keynote Address | Concurrent Sessions | Poster Sessions
Short Courses (full day) | Short Courses (half day) | Tutorials | Practical Computing Demonstrations | Closing General Session with Refreshments

Last Name:

Abstract Keyword:

Title:

     

Viewing Short Course (half day)s onlyView Full Program
     
Thursday, February 19
SC3 What Can We Learn from Software Engineers?
Fill out evaluation
Thu, Feb 19, 8:00 AM - 12:00 PM
Napoleon C2
Instructor(s): Paul Teetor, Quant Development LLC

Download Handouts
Do any of the following problems sound familiar? Your organization is swimming in SAS or R code.You’ve saved numerous versions because you can’t afford to lose anything. People are unsure of which version is best.Testing your code is difficult.You’ve cut-and-pasted your code so often you’re seeing the same parts over and over. Everyone does their work differently, and people can’t share code easily.The code is now so convoluted that newcomers cannot understand it.The thought of major changes makes your head hurt. Software engineers have spent decades dealing with these problems, and the result is a body of best practices for managing software.These best practices are an art and not well known outside the discipline.This course will explain the techniques of software engineering and how they apply to managing your software. Topics range from code-level practices to design issues and project control.The course will focus on software engineering in the context of R, which provides a rich environment for statistical programming. Participants are expected to arrive with R and Rstudio installed on their laptops. Some familiarity with R is required.

Outline & Objectives

The course alternates between explanation and exercises.

Coding standards – Adopting a consistent style.
Exercise: Clean up dirty code.

Defensive programming – Protecting yourself from others... and yourself.
Exercise: Add defensive checks to a program.

Code walk-throughs – Shine a light on your code.
Exercise: Walk-through code together: bad code, good code.

Version control – Keeping track of your intellectual property.
Demonstration: Create a simple code repository.

Unit testing – Start with the quality control.
Exercise: Write and execute unit tests.

Libraries – Don't reinvent the wheel.
Exercise: Create a simple library.

Refactoring – Apply your powers of abstraction.
Exercise: Create a variation on a function, then refactor both versions.

Modularity – Keep your design strong and flexible.
Example: Distinguishing input vs. calculation vs. presentation.

Execution environments – Separate sandboxes for developing, testing, and production.

Integration testing – Be confident that everything works harmoniously.

Parting words - A few lessons from project management.

About the Instructor

I am both a professional software engineer and a professional statistician. My consulting practice focuses on the financial services sector, where I build quantitative applications by combining those twin backgrounds. I have over 30 years professional experience.

My public speaking and writing has been quite popular. Last year, I was asked to give the Introduction to R Workshop at the University of Chicago (Financial Mathematics program). This year, I've been asked to give that workshop plus a keynote speech at the Great Plains R User's Group conference. My talks have been accepted three times for the annual R/Finance Conference. I am frequent presenter at the Chicago R User's Group. I've taught undergraduate classes in statistics and in computer programming.

I spoke at CSP 2014, where my presentation on Bootstrapping Time Series Data received excellent evaluations.

I am the author of the R Cookbook (O'Reilly, 2011), one of the top-selling books for R.

My degrees are in Computer Science (BS, Cornell University; MS, Northwestern University) and Applied Statistics (MS, DePaul University).

Relevance to Conference Goals

I see a surprising number of statisticians and analysts that are “swimming in their software”. I've taught software engineering techniques to my clients, and they say it revolutionized the management of their code base. Practitioners and organizations that adopt these techniques have experienced higher quality results with less work and less chaos. Participants in this course will bring that benefit to their organizations.

 
SC4 How to Start and Run an Independent Statistical Consulting Business
Fill out evaluation
Thu, Feb 19, 8:00 AM - 12:00 PM
Napoleon D3
Instructor(s): Stephen David Simon, P.Mean Consulting

Download Handouts
An independent statistical consulting job is both rewarding and challenging. If you follow this career path, you will need to learn many business skills.This course will review practical issues you will face in setting up an independent consulting business. Should you set up a limited liability corporation or a subchapter S corporation? Should you bill by the hour or the project? What insurance do you need? Should you have a standard contract in place prior to any consulting work? In addition to these legal and accounting requirements, there are human issues that you as an independent consultant will have to face. Your most important job is finding new clients. The best method, by far, is “word of mouth,” and there are several strategies you can adopt to enhance your visibility and increase the number of referrals you receive. You also need to know how to keep your current clients happy. This class will include several small-group exercises during which you will share your thoughts and experiences on how to handle specific cases involving independent statistical consulting. No specific knowledge about business models, accounting, or legal issues will be assumed.

Outline & Objectives

Course outline: The first lecture will cover the types of independent consultants and the contrast these with a consultant who is part of a larger organization. This will be followed by a small group exercise where students discuss their career goals in one/five years. This followed by lectures on company types (sole proprietorship, partnership, limited liability corporation) and a discussion of the pros and cons of billing by the hour versus by the project.

A second small group exercise presents a hypothetical consulting project and asks each group to plan an estimate on the entire project cost or on the hours needed. Additional lectures will cover contracts, accounting, and insurance.

The last two lectures will discuss finding new clients and keeping existing clients happy. This includes a third small group exercise on a hypothetical consulting scenario that has gone sour. Students will discuss whether they should end the consulting relationship or find ways to get the interaction back on track.

The target audience is anyone who is considering a career as an independent consultant or who is curious about the advantages and disadvantages of this type of work.

About the Instructor

Steve Simon is a part-time independent statistical consultant with P.Mean Consulting and part-time faculty member in the Department of Biomedical and Health Informatics at the University of Missouri-Kansas City.

He presented a short course at the inaugural meeting of the Conference on Statistical Practice in 2012, "Promoting Your Consulting Career in the Era of Web 2.0" and has led a roundtable discussion on the same topic at the 2011 Joint Statistical Meetings. A brief summary of this talk is on his website: http://www.pmean.com/12/promoting.html

He also was a panel member at the 2011 JSM on "Successful Statistical Consulting: The Practicalities" and discussed "How Independent Statistical Consulting is Different." For a brief overview of this presentation, see http://www.pmean.com/11/ConsultingDifferences.html

Dr. Simon is the author of a book published by Oxford University Press, "Statistical Evidence in Medical Trials. What Do the Data Really Tell Us." He has a website (www.pmean.com) with over 1,300 pages on statistics, research ethics, and evidence based medicine and is an active participant in the Statistical Consulting Section Discussion Board.

Relevance to Conference Goals

The first theme of the Conference on Statistical Practice is "Communication, Impact, and Career Development." This class addresses a very specific type of career development, starting and running an independent consulting business. The course will cover business skills, such as billing, contracts, and marketing that participants need to advance their careers. This course will emphasize communication with clients and customers, both to attract new clients and to keep existing clients happy.

 
SC5 An Overview of Clustering: Finding and Extracting Group Structure in High-Dimensional Data
Fill out evaluation
Thu, Feb 19, 8:00 AM - 12:00 PM
Borgne
Instructor(s): Rebecca Nugent, Carnegie Mellon University; Samuel Ventura, Carnegie Mellon University
Clustering is the search for similar or homogeneous subgroups in a population, say, of consumers, patients, genes, images, text documents, or anything that can possibly contain group structure. For example, consumers might be divided into market segments based on their preferences and spending habits. In public health, we might be interested in predicting which outcome group a patient is likely to be in given their symptoms, past history, and current treatment. In document clustering, the goal is to group similar pieces of text (e.g., blogs, emails, posts, letters, articles, etc.) based on the words used, the frequency, and other text features. In all cases, the goal is to extract structure from potentially high-dimensional data.The difficulty, however, often lies in which clustering approach to adopt, particularly given that results are rarely independent of approach.This tutorial will give an overview of algorithmic and statistical approaches to clustering with an emphasis on how to choose an approach and its related parameter. Note that while we use the statistical software package R, these methods are available on other platforms.

Outline & Objectives

Our primary goal is to provide the practitioner with a solid background in the variety of available clustering approaches and their related assumptions, necessary parameter choices, cluster shapes and sizes, and advantages/disadvantages. The practitioners will also gain skills in critiquing and interpreting their final cluster solution and identifying unstable or undesirable clusters.

Topics include: (may be interspersed as appropriate)

Deterministic Algorithms
- Hierarchical Linkage Clustering
- K-Means (including fuzzy version)
- K-Medoids

Statistical Approaches
- Parametric mixture models/model-based clustering
- Nonparametric bump hunting or mode finding
- Spectral Clustering or Image Segmentation

Longitudinal Clustering

Validation and Visualization
- Uncertainty
- Cluster Validation Strength
- Silhouettes
- Stripes and Neighborhoods


About the Instructor

Professor Nugent is an Associate Teaching Professor in the Department of Statistics at Carnegie Mellon University. Her research primarily focuses on finding and visualizing high-dimensional structure. She was the 2009 Chikio Hayashi Award recipient (a Young Promising Researcher award presented by the International Federation of Classification Societies). She has served as the President of the Classification Society (of North America) and is active in the ASA Sections on Statistical Computing and Statistical Graphics. She has taught undergraduate and graduate classes in statistical learning, regression, document clustering, record linkage, among others. She has also won several teaching awards, including the Elliott Dunlap Smith Award for Distinguished Teaching and Educational Service.

Samuel L. Ventura is a PhD Candidate in the Department of Statistics at Carnegie Mellon University. His research focus is on large-scale clustering and classification techniques. He also brings extensive statistical computing experience. Sam has been an invited speaker at several statistical learning conferences and has taught several summer courses at CMU.

Relevance to Conference Goals

With the advent of "Big Data" sets and cheaper, more ubiquitous data collection, we have more data than we can handle. Describing and characterizing the structure in these high-dimensional data sets is paramount. Being able to reduce the complexity of your statistical analysis by honing in on the underlying group structure may increase your analysis options. In addition, with a dual focus of making informed decisions about choice of clustering approach and summarizing, visualizing, and interpreting the final clusters, attendees will be more confident and better positioned to interface with their clients and deliver statistically sound results that directly correspond to real, implementable action items in practice (e.g. a different strategy for each group).

 
SC6 Building Your Professional Brand
Fill out evaluation
Thu, Feb 19, 1:30 PM - 5:30 PM
Napoleon D3
Instructor(s): Bill Williams, Organizational Learning Consultant

Download Handouts
The world of work is full of people with ambition and aspirations to do bigger things as their careers progress. While the rules for success—most of which are unwritten—vary from organization to organization, two ingredients are always essential: 1) your current performance on the job and 2) the potential other people see in you. How people view your performance and potential is derived only in part by what you know and the functional expertise you possess. The rest is based on the image you project and the exposure to other people your job affords you. In this session, we’ll examine both the impression you want others to have of you as a professional—your “brand”—and how your communications can influence the impressions of others. You will define the brand you would most like people to associate with you and consider how to manage your behavior to support your brand, particularly when communicating with senior managers and leaders.

Outline & Objectives

Objectives

Understand what is meant by the term “personal brand”
Identify the characteristics of your ideal personal brand – the experience you want others to have when working with you
Determine actions you can take and behaviors you can display when working with others – whether in-person or virtually – that will support your ideal personal brand

Outline
What is a brand and what characterizes a brand, for better or worse?
What brands do you value and why? Whose “personal brand” do you respect?
Brainstorm: what do you ideally want to characterize your personal brand?
-Values
-Capabilities: talents, skills and knowledge
-Behavioral characteristics: influencing style, composure, communication style
Who are your key constituents at work?
-What impressions of you do you want them to have?
-What do you want them to value in the way you support and collaborate with them?
Putting it all together – bringing your personal brand to life:
-With individuals in-person
-With groups in-person
-In writing
-In virtual communication

About the Instructor

Bill Williams is an organizational learning consultant and has been part of the Conference on Statistical Practice since its inaugural year.

Relevance to Conference Goals

This session aligns to the communication, impact and career development track by focusing on how to position oneself for success within an organization.

 
SC7 Design of Not-Simple Graphs
Fill out evaluation
Thu, Feb 19, 1:30 PM - 5:30 PM
Borgne
Instructor(s): Richard M. Heiberger, Temple University

Download Handouts
Complex data analyses may require complex graphs to place the full information of the analysis into a form the intended client will be able to read. In our opinion, graphs are the heart of most statistical analyses; the corresponding tabular results are formal confirmations of our visual impressions. Data analysts are responsible for the display of data with graphs and tables that summarize and represent the data and the analysis. The graphs are often the best means of communication between the data analyst and client. This course will emphasize the design of graphical displays that best represent the message of an analysis.

Outline & Objectives

We will look at many examples of graphs, from simple to complex. We need to begin with simple graphs to learn the vocabulary of graphs. We then proceed to more complex graphs and see how they are constructed by using the same graphic vocabulary. The examples come from journal articles, text books, and general publications. We will mostly show good examples, but will of necessity show some not-good examples (and then revise them) to emphasize how the principles we recommend have been derived and why they are important for communication between the data analyst and the client. Most of the examples will be from the medical/pharmaceutical areas or from social sciences. The concepts are much more broadly applicable. The graphs we show will be drawn using the graphics functions in R because that platform offers substantial capabilities for producing graphs customized to the particular needs and visions of the analyst. They could be drawn in any other graphical system that has a reasonably rich set of graphical primitives.

About the Instructor

Professor of Statistics at Temple University. Chair (2011) of the Statistical Computing Section of the ASA. Consulting experience in the pharmaceutical and social science areas. I designed and programmed the AEdotplot, the now standard display for adverse events in clinical trials. I coauthored a paper in the Handbook of Data Visualization Consultant for a US Government agency on the design of visualizations to make their data more accessible. I have several packages available for the R system. My most recent book, R through Excel (Springer, 2009) with Erich Neuwirth, shows how to access the high quality of R graphics directly from the comfort of the familiar Excel spreadsheet.
I have a recent paper: Heiberger, R., Robbins, N. (2014). ``Design of Diverging Stacked Bar Charts
for Likert Scales and Other Applications.' in the Journal of Statistical Software, 57 (5), 1--32.
I am preparing the second edition Heiberger, Richard M., and Burt Holland (anticipated 2015) of
Statistical Analysis and Data Display: An Intermediate Course with Examples in R, Springer, New York. I presented a session on "Structured Sets of Graphs" at the 2014 CSP.

Relevance to Conference Goals

On conclusion, the course participants will have examples and experience with complex graphs. They
will be able to look at new data situations and analyses and to design graphs that will communicate the analyst's intended message to the reader. Better communication skills improve performance and improved performance enhances their professional development.

 
SC8 Text Analytics: Integrating Topic, Opinion, and Sentiment Analysis
Fill out evaluation
Thu, Feb 19, 1:30 PM - 5:30 PM
Napoleon C2
Instructor(s): Edward R. Jones, Texas A&M Statistical Services
This workshop discusses current statistical approaches to conducting a linked analysis of reviewer comments, sentiments, and rating. Today, statisticians have powerful tools available for integrating the analysis of structured and unstructured data. Reviewer and customer comments can be used with their ratings and other background information to build models linking ratings, opinions, and emotions. Done well, this provides a more complete picture of what people think and feel about services and products.

Outline & Objectives

Learning Objectives:

(1) Gain an understanding of the terminology and concepts of opinion, topic and sentiment analysis in text analytics.


(2) Understand an effective process for modeling reviewer comments with structured data.


(3) Understand of available software, both freeware and commercial, for conducting a linked analysis of text and structured data.


Outline:

(1) Introduction to Text Analytics: Terminology and Software - Today and Tomorrow


(2) Opinion and Topic Analysis: Techniques and Tools for Discovering Opinions and Topics


(3) Sentiment Analysis: Extracting Emotional Content


(4) Discussion and Questions

About the Instructor

Over 10 years of experience in development of commercial analytics software, and over 15 years of experience teaching and applying techniques in analytics and quality assurance. Formerly an examiner for the Malcom Baldrige National Quality Award. Currently teaches advanced analytics at Texas A&M University and mentors graduate students in analytics competition.


See - http://www.linkedin.com/pub/edward-r-jones/7/ba4/93/

Relevance to Conference Goals

Attendees enjoy CSP because of its applied nature and because they leave with new knowledge and tools useful in their career and work. For many, this workshop provides a new tool; a statistical approach and tool for exploring the relationship between what people think and what they say and feel.


What people think is often captured as structured data. Usually in the form of a answers to a closed questions such as: "On a scale from 1 to 5, how satisfied are you with our product?"
What people say are acquired from answers to open-ended questions such as "why are you satisfied or not satisfied with our product?"
What people feel about what they are saying is discovered using sentiment analysis.


This workshop provides participants the background needed to start applying opinion, topic and sentiment analysis in their work. The approaches and tools are illustrated using customer review data.