Viewing session type: Short Course (half day)
Back to search menu
Thursday, February 20
Thu, Feb 20
8:00 AM - 12:00 PM
Regency C
SC4 - Side-by-Side Learning of R and Python by Analyzing Big Longitudinal Data
Short Course (half day)
Instructor(s): Mohammed Rahim Uddin Chowdhury, Kennesaw State University
R and Python are two highly used open-source interpreted programming languages with a large and diverse community. Due to the open-source nature, new libraries are developed and added continuously to their respective catalog for researchers when new Mathematical, Statistical or other models are discovered. R has more than 12000 packages available in CRAN (open-source repository), which researchers can use to perform whatever analysis they need. The rich variety of library makes R the first choice for statistical analysis, especially for specialized analytical work. On the other hand, Python does not have that many packages for data analysis and data modeling. Most of the data science job can be done with five Python libraries: Numpy, Pandas, Scipy, Scikit-learn and Seaborn. However, it is known to the scientific community that Python is catching up R by rapidly developing packages for data mining and statistical modeling. In this short course at CSP 2020, I will show in details the side by side comparisons between R and Python on six topics such as data mining and data analysis, test of hypothesis, correlation and regression, simulation, mathematical computations, text mining.
Outline & Objectives
The outline of the short course is to discuss the application of R and Python on the problems of
1. Data mining and data analysis (consists of 50 different data mining problems)
2. Test of Hypotheses and confidence interval (consists of 20 different problems)
3. Regression models (16 different models will be discussed)
4. Simulations (9 different simulation design will be discussed)
5. Mathematical Computations (50 different problems will be computed)
6. Text mining (Word cloud, sentimental analysis, and most graphs for frequently used word will be discussed)
The objective of this short course is to train participants on how to use R and Python simultaneously in solving problems from above mentioned topics for their professional works. The instructor of the short course does not require that the participants should have prior knowledge of using R and Python. The instructor will provide all the problems in easily understandable questions format together with R and Python programming code. First, the instructor will discuss the problems, and then he will run the R and Python code together with the participants.
About the Instructor
I obtained my PhD degree in Statistics in 2013, and working as a tenure track Assistant Professor of Statistics in the Department of Statistics and Analytical Science at Kennesaw State University since August 2015. During my four years at KSU, I have taught altogether ten unique undergraduate and graduate courses, which is more than two new courses per year. Five courses are undergraduate courses and they are as varied as introductory statistics courses up to R and Python programming. I was motivated to teach python programming as it has a high and growing demand in industry, and many employers want data engineer with expertise in python. Five other courses are graduate courses. I taught a theoretical and computation Bayesian Statistics special topic course for graduate students. R programming language was used to teach computational parts such as EM algorithm, MCMC, Gibbs sampling, Metropolis algorithm, and Metropolis-Hasting algorithm. Another graduate course is Applied Time Series Analysis. For teaching most courses, I always prefer R programming language. I taught the undergraduate R programming course in Fall 2018. In Spring 2019, I am taught Python Programming course.
Relevance to Conference Goals
‘Conference on Statistical Practice’ is usually considered a platform for applied researchers, who use novel statistical and machine learning methods to solve data driven problems. To solve data driven problem, R and Python have built in packages to use. This short course will introduce both R and Python to analyze a big longitudinal data. In additional various simulation designs and text mining will be discussed in this course. This course will help any person interested to learn R and Python from the scratch.
Thu, Feb 20
8:00 AM - 12:00 PM
Regency D
SC5 - Essential Collaboration: The ASCCR Frame
Short Course (half day)
Instructor(s): Heather Smith, Cal Poly; Eric Vance, LISA-University of Colorado Boulder
Download Handouts
Statisticians and data scientists often collaborate with domain experts from many different fields in academia, business, and government. Learning more effective collaboration skills will enable us to maximize our professional impact in these areas. In this short course, participants will learn and practice essential skills that will enable them to improve their collaborations and add more value to their projects, customers, and organizations. We introduce the ASCCR framework that describes our current best practices for five aspects of statistical consulting and collaboration (Attitude-Structure-Content-Communication-Relationship). Specifically, participants will learn how to establish foundational collaborative Attitudes, implement the POWER Structure for conducting effective meetings, apply the Q1Q2Q3 approach to consultations and collaborations, Communicate more effectively, and adopt practical strategies to strengthen Relationships. Participants will practice these skills via team exercises, role-plays, video coaching, and individual reflections to become more effective collaborators, allowing them to have greater impact in their roles as statisticians and data scientists.
Outline & Objectives
Our objective is to introduce key concepts that will help participants improve their collaboration skills so they can return to key roles within their organizations and achieve greater impact. This short course will be useful for all levels from beginning to advanced. Prerequisites are a desire to improve one’s personal effectiveness and openness to try new methods and ways of thinking in the practice of statistics and data science.
1 Welcome and warm-up team exercises
2 Introduction to ASCCR Frame
3 Attitude of effective collaboration (participants complete Attitude checklist)
4 POWER structure (Prepare-Open-Work-End-Reflect) and why we believe this structure produces effective meetings
5 Best practices for opening meetings (Eric and Heather mock role play, video review, then participants role play)
6 Best practices for ending meetings (Eric and Heather mock role play, video review)
Break
7 Q1Q2Q3 approach to the Content of statistical projects (reflection exercise)
8 Triangle of Statistical Communication (team discussion)
9 Tips for strengthening Relationships (reflection exercise)
10 Overall written reflection and individual plan for improving collaboration skills.
About the Instructor
For the past 11 years, Dr. Eric Vance, an Associate Professor at the University of Colorado Boulder, has been the director of LISA (Laboratory for Interdisciplinary Statistical Analysis) where he has trained 271 statisticians to move between theory and practice to collaborate with 9500+ domain experts to apply statistics and data science to answer their research or business questions. He has taught workshops and webinars on collaboration in nine countries around the world, including several in collaboration with Heather Smith.
Heather Smith has 28 years of experience consulting with academic, industrial, service, and government clients in the United States, Europe, and Asia. She began this work as a statistical consultant at Westat, Inc. For 21 years she has been a faculty member in the Statistics Department at Cal Poly San Luis Obispo where she consults with academic and private sector researchers and teaches a wide variety of applied statistics courses, including courses in statistical communication and consulting. She has offered over a dozen workshops, short courses, and webinars on these topics, and has trained hundreds of statistical collaborators.
Relevance to Conference Goals
This short course is relevant for all three of the three main conference goals. Participants will learn new skills and practical tips to apply whenever they interact with another person in their job as an applied statistician. Participants will explicitly learn how to better communicate and collaborate with their clients and customers. Skills learned in the course will equip participants to have a positive impact on their organization and an upward career trajectory. Participants will return to their jobs with new ideas, techniques, and strategies to improve their ability to communicate and collaborate effectively, resulting in a greater impact on their organizations and increasing the overall impact of statistics and data science in the world at large.
A version of this course was taught at the 2018 CSP and received a high average rating of 4.63 out of 5 (n=8 responding out of 22 participants). The official qualitative feedback we received: “This course is essential for any statistician who needs to collaborate with people in other disciplines, or sell their business to clients. I very strongly recommend it.” Unofficial feedback was very positive as well.
Thu, Feb 20
1:30 PM - 5:30 PM
Regency C
SC6 - Increasing Business Impact Through Automated Reporting in R
Short Course (half day)
Effective communication of results is among the essential duties of the industrial statistician, but the sometimes tedious mechanics of report production together with the sheer volume of data that many statisticians now must process combine to make reporting design an afterthought in too many cases. In this half-day course, we review recent advances in automated report production that liberate resources for statisticians to focus on the interpretation and communication of results, while simultaneously reducing errors and increasing consistency of analyses. We teach the course through an extended example, cumulatively building an R script that takes participates from receipt of an example dataset to a beautifully-designed and nearly completed PowerPoint presentation automatically and using freely available, open-source packages. Details of how to customize the final presentation to incorporate corporate branding - such as logos, font choices, and color palettes - will also be covered.
Level: We recommend a minimal level of experience using R, RStudio, and the tidyverse.
Outline & Objectives
With this half-day course, we help industrial statisticians increase their business impact by leveraging tools for automated report production in R.
Topics covered include:
* What does automated reporting mean in practice?
* Scripting analyses, tables, and charts
* Automated production of PowerPoint presentations
* Building a "cookbook" of reporting recipes
* Font choices and color palettes
* Layering storytelling onto an automated report
About the Instructor
Dr. John Ennis is president of Aigora (www.aigora.com), a consulting and coaching organization dedicated to helping market researchers prepare for the rise of artificial intelligence. As part of this preparation, Aigora provides instruction in the automation of standard work practices, including report preparation. Dr. Ennis, a Ph.D. mathematician who conducted his postdoctoral training in computational neuroscience, has 11+ years of market research consulting experience, has presented at JSM and CSP, and will have presented at SDSS by the time of CSP 2020. In addition, Dr. Ennis is the author of over 30 peer-reviewed publications and two books on quantitative market research topics. Earlier this year, Dr. Ennis branched out from the Institute for Perception to found Aigora - in his prior work, Dr. Ennis was a well-reviewed instructor at dozens of short courses covering quantitative market research, including instruction on topics within data science. In his professional work, Dr. Ennis has used tools for automated reporting for approximately five years, and he now teaches such tools to his clients operating within a variety of enterprise-level businesses.
Relevance to Conference Goals
Through participation in this course, attendees will learn to support their internal clients with well-designed and easy-to-read reports they prepare quickly and can continually improve over time, building their credibility and influence within their organizations.
Thu, Feb 20
1:30 PM - 5:30 PM
Regency D
SC7 - Building LaTeX Templates for R Markdown to Produce Branded PDF Reports
Short Course (half day)
Instructor(s): Ben Barnard, Wells Fargo
Branded reports give a clean, clear and consistent message for data science teams in an organization. We walk through the process of building a latex template distributed through an R package. We begin with a short introduction to rmarkdown and some motivating examples for using branded reports. Then, we demonstrate from scratch how one can build a minimal latex template, and distribute in a R package. We describe some best practices for branding and highlight use of ggplot2 themes to match document branding. Finally, we walk through some further uses such as parameterized reports, using the template for bookdown, and recommendation for deploying the R package at your company.
Outline & Objectives
The student should be able to walk away from this class with:
1. a general understanding of rmarkdown,
2. why it is important to have branded reports,
3. a R package with a latex template that uses their companies branding,
4. understanding of best practices in branding,
5. use of ggplot2 themes,
6 and some possible further uses for the using and distributing the template.
About the Instructor
Ben Barnard is a Data Scientist at Wells Fargo in the Team Member Insights group. Ben has a PhD from Baylor University in Statistics.
Jeff Idle is an Analytic Manager at Wells Fargo in the Team Member Insights group. Jeff leads the HR Advanced Analytics & Architecture team. Jeff is currently pursuing a MBA from the University of Minnesota's Carlson School of Management.
Relevance to Conference Goals
We stress using branded reports to communicate clean, clear and consistent messages to your audience. Communication is the most important part of Data Science since decision makers are rarely analytic experts. Branded reports bring a certain professionalism that will be greatly appreciated by administration. Building the latex templates saves time and makes sure every report comes out looking the same. Consistently branded reports allows your team to be recognized immediately by your work product.