All Times ET

Tuesday, February 1

Tue, Feb 1
10:00 AM - 5:30 PM
Virtual

SC01 - Essential Communication and Collaboration
Short Course (full day)

Instructor(s): Ilana A. Trumble, LISA-University of Colorado Boulder and UC Anschutz; Eric Vance, LISA-University of Colorado, Boulder

Statisticians and data scientists must communicate and collaborate with domain experts from many different fields in academia, business, and government. Learning more effective communication and collaboration skills will enable us to maximize our professional impact in these areas. In this short course, participants will learn and practice essential skills that will enable them to improve their communication and collaboration to add more value to their projects, customers, and organizations. We introduce the ASCCR framework that describes our current best practices for five aspects of statistical consulting and collaboration (Attitude-Structure-Content-Communication-Relationship). We will focus especially on the communication skills of asking great questions; listening, paraphrasing, and summarizing; and explaining statistics to non-statisticians to create shared understanding with our clients and collaborators. Participants will practice these skills via team exercises, role-plays, video coaching, and individual reflections to become more effective communicators and collaborators, enabling them to have greater impact in their roles as statisticians and data scientists.

Outline & Objectives

Our objective is to help participants improve their communication and collaboration skills so they can achieve greater impact. This short course will be useful for all levels from beginning to advanced. Prerequisites are a desire to improve one’s personal effectiveness and openness to try new methods and ways of thinking in the practice of statistics and data science.

1 Welcome, team assignments, and warm-up exercises
2 Introduction to ASCCR Frame
3 Attitude of effective collaboration (checklist and exercise)
4 POWER structure (Prepare-Open-Work-End-Reflect) produces effective meetings
5 Best practices for opening meetings (Eric and Heather mock role play, video review, then participants role play)
6 Q1Q2Q3 approach (reflection exercise)
7 Triangle of Statistical Communication
a. Asking Great Questions (participant role play)
b. Listening, Paraphrasing, Summarizing (video clip review)
c. Explaining Statistics to Non-statisticians (video clip and role play)
d. Creating Shared Understanding
8 Strengthening Relationships (reflection exercise)
9 Best practices for ending meetings (participants role play)
10 Individual plans for improving communication and collaboration.

About the Instructor

For the past 13 years, Dr. Eric Vance has been the director of LISA (Laboratory for Interdisciplinary Statistical Analysis) where he has trained 285 statisticians and data scientists to move between theory and practice to collaborate with 9700+ domain experts to apply statistics and data science to answer their research, business, or policy questions. He has taught workshops and webinars on collaboration in nine countries, including several in collaboration with Heather Smith at CSP and JSM. This workshop gets better every time they teach it.

Heather Smith has 30 years of experience consulting with academic, industrial, service, and government clients in the United States, Europe, and Asia. She began this work as a statistical consultant at Westat, Inc. For 23 years she has been a faculty member in the Statistics Department at Cal Poly San Luis Obispo where she consults with academic and private sector researchers and teaches a wide variety of applied statistics courses, including courses in statistical communication and consulting. She has offered over a dozen workshops, short courses, and webinars on these topics, and has trained hundreds of statistical collaborators.

Relevance to Conference Goals

This short course is relevant to Theme 1 and 4. Participants will learn new skills and practical tips to apply whenever they interact with other people. Participants will explicitly learn how to better communicate and collaborate with their clients and customers. Skills learned in the course will equip participants to have a positive impact on their organization and an upward career trajectory. Participants will return to their jobs with new ideas, techniques, and strategies to improve their ability to communicate and collaborate effectively, resulting in a greater impact on their organizations and increasing the overall impact of statistics and data science.

A version of this course was taught at the 2018 CSP and received a high average rating of 4.63 out of 5 (n=8 responding out of 22 participants). The official qualitative feedback we received: “This course is essential for any statistician who needs to collaborate with people in other disciplines, or sell their business to clients. I very strongly recommend it.” Unofficial feedback was very positive as well. A version of this course was also taught at 2020 CSP, but we don’t recall receiving any official feedback.

Tue, Feb 1
10:00 AM - 5:30 PM
Virtual

SC02 - Hands-On Introduction to Python in Predictive Analytics and Machine Learning
Short Course (full day)

Instructor(s): Mei Najim, The University of Chicago

This is an introductory course to provide a hands-on introduction to Python, the well-known open-source programming language for analytics. We will start with an introduction to Jupyter Notebook and Python basics, then the most popular data science libraries (Numpy and Pandas), data visualization libraries (Matplotlib, Seaborn), and machine learning library (Sklearn).

We will introduce a Predictive Analytics Life Cycle Process through a case study to methodically expose attendees to best practices and Python’s rich set of data science libraries, providing hands-on experience and know-how. Lastly, we will use the course material to develop a predictive model from raw data (data TBD). Python code will be provided.

Outline & Objectives

1. Introduction to Jupyter Notebook and Python Basics
2. Introduction to Data Science Libraries: NumPy and Pandas
3. Introduction to Data Visualization and Interactive Data Visualization Libraries: Matplotlib, Seaborn, Plotly, and Clufflinks
4. Introduction to A Life Cycle of Predictive Analytics Process through A Case Study and Machine Learning Library Sklearn
a). Data Exploratory Analysis and Data Pre-Processing
b). Supervised Learning: Regression (Linear, Multiple Linear, Polynomial Regression, Decision Tree, and Random Forest)
c). Supervised Learning: Classification (KNN, Logistic Regression, Decision Tree, and Random Forest)
d). Unsupervised Learning: K-mean Clusters and Principal Component Analysis (Dimensionality Reduction)
5. Using all the above to develop a Predictive Model (data TBD): start from Raw Data Exploratory Analysis, Data Visualization, Data Preparation, Feature Engineering, and Model Building (using Logistic Regression, Decision Tree, Random Forest and Model Performance Evaluation)

About the Instructor

Mrs. Mei Najim is currently teaching Programming for Analytics (R & Python) part time at The University of Chicago. Mei has 16 years of hands-on analytics experience in claim management, underwriting, pricing, reserving, and catastrophe risk management in the insurance industry and collections analytics in the banking industry. Since 2007, she has been mainly working and leading various levels of predictive analytics projects to develop analytics capability for financial organizations. She has frequently presented at conferences to share her expertise. Mei holds a BS degree in Actuarial Science from Hunan University and two MS degrees, one in Applied Mathematics and the other in Statistics, from Washington State University. Mei is a member of the American Statistical Association and a Certified Specialist in Predictive Analytics (CSPA) of the Casualty of Actuary.

Relevance to Conference Goals

The objective is to provide attendees with practical knowledge about using Python programming to analyze data and develop a life cycle predictive analytics through the application of state-of-the-art statistical methods and machine learning algorithms.

Tue, Feb 1
10:00 AM - 5:30 PM
Virtual

SC03 - Real-World Data and Evidence: An Interdisciplinary Approach and Applications to Precision Medicine and Healthcare
Short Course (full day)

Instructor(s): Jie Chen, Overland Pharma; Tze Leung Lai, Stanford University

Real world data and evidence (RWD&E) have been increasingly used in drug development and regulatory decision-making since the passage of the 21st Century Cures Act on December 2016 and the issuance of the FDA’s RWE framework in December 2018. Whereas pharmaceutical companies use RWD&E to support clinical development activities and to seek evidence to inform health technology assessment (HTA) decisions, the healthcare community uses RWD&E to develop guidelines and decisions to support medical practice and to assess treatment patterns, costs and outcomes of interventions. Although high performance computing tools, artificial intelligence and machine learning algorithms have been conveniently applied to RWD, there are still substantial challenges in deriving RWE from RWD and in using the RWE in drug development and healthcare decision-making. This short course aims to provide the audience with practical interdisciplinary approaches and applications using RWD&E in product development, regulatory decision-making, and healthcare delivery, with case studies given throughout the presentation.

Outline & Objectives

Course learning objectives: The audience will learn the commonly used as well as cutting-edge decision-analytics approaches that are tailored for specific questions in product development, and regulatory and healthcare decision-making. Case studies are given throughout the presentation of the short course to illustrate the applications of the methods.

1. Introduction
2. Real World Data
3. Statistical and Machine Learning Methods for Healthcare Decision Analysis
4. Disease Diagnosis, Patient Heterogeneity and Adherence
5. Health Technology and Health Economic Assessment
6. Risk Models and Outcome Prediction
7. Benet-Risks Assessment
8. Causal Inference Using Real World Data
9. Analysis of Data Generated from Mobile Devices
10. Public Health Surveillance and Pharmacovigilance
11. Real World Data to Support Clinical Development
12. Pragmatic Trials and CER Trials

About the Instructor

Presenters' background:
1. Tze Leung Lai, PhD: Ray Lyman Wilbur Professor of Statistics and of Biomedical Data Science in the School of Medicine and of the Institute for Computational & Mathematical Engineering (ICME) in the School of Engineering, Stanford University, Director of Financial and Risk Modeling Institute, and Co-Director of Center for Innovative Study Design at the Stanford School of Medicine, IMS and ASA Fellow.
2. Jie Chen, PhD: Senior Vice President and head of Biometrics, Overland Pharmaceuticals and a visiting member of the Center for Innovative Study Design, Stanford University.
3. Richard Baumgartner, PhD: Sr. Principal Scientist with Biometrics Research Department, Biostatistics and Research Decision Sciences (BARDS), Merck and Co.

Relevance to Conference Goals

This short course will provide the best practice of statistics in the areas of real-world data and evidence to support drug development and regulatory decision making.

Tue, Feb 1
10:00 AM - 1:30 PM
Virtual

SC04 - Skills for Statistical Writing: Tips and Tricks for Improving Written Communication
Short Course (half day)

Instructor(s): Emily Griffith, North Carolina State University; Julia Sharp, Colorado State University; Zachary Weller, Colorado State University

Effective writing is an essential skill for statistical practitioners, yet it is a skill that is often overlooked in coursework due to the need to stay up to date on the latest statistical methodology. This course will provide participants the opportunity to think critically about the writing process and learn about principles and best practices for statistical writing. Participants will improve their writing skills through participation in writing exercises and will be given the opportunity to receive feedback on their writing. The course will address topics such as organizing and streamlining, reducing clutter, best practices for peer review, and statistical aspects of writing such as alternatives to using the term “statistically significant”. The course will engage participants through discussion and short exercises of editing and reviewing writing samples.

Outline & Objectives

Outline: Introduction (20 min); Introductory Lecture (1 hr): instructors share how they work through the writing process, best practices, and principles of effective writing; Discussion, Questions, and Conversation (15 min); Break (10 min); Mini-lectures with exercises (1 hr 45 min, approximately 25 minutes each): [1] organization and streamlining with exercises in building outlines and telling the story, [2] peer review: what should a peer review look like with exercise in reviewing writing samples and peer review checklist, [3] reducing clutter with exercises in a checklist of steps for reducing clutter and paced, productive, and powerful writing, and [4] statistical aspects of writing such as avoiding “statistically significant” with exercise of rewriting passages; Closing Discussion (15 mins).

Objectives: [1] Improve participants’ confidence and skills in written communication through examples and discussion. [2] Give participants the opportunity to get feedback on their own writing and learn best practices for giving feedback on the writing of others. [3] Provide participants with tips, tricks, and resources for improving their writing and reviewing skills.

About the Instructor

The three instructors (Dr. Zach Weller, Dr. Julia Sharp, Dr. Emily Griffith) for this course have extensive statistical collaboration expertise and PhDs in Statistics. All three instructors have successfully published and peer-reviewed numerous papers in both statistics and applied science journals. The instructors have also been involved in grant writing as both principal investigators and collaborating statisticians.

Relevance to Conference Goals

This short course will increase participants’ confidence in written communication by providing them resources and feedback on the writing and review process. The course will improve participants' skills through short lectures on writing topics followed by exercises and discussion.

Tue, Feb 1
10:00 AM - 1:30 PM
Virtual

SC05 - Using Design of Experiments (DOE) in Industry
Short Course (half day)

Instructor(s): Theodoro Koulis, Genentech; Tony Pourmohamad, Genentech

Design of experiments (DOE) remains the gold standard for the design and development of industrial applications. DOEs can increase efficiencies and provide valuable experimental information that may be used to improve industrial processes. Despite its valuable contributions to various industries, there are a lot of misconceptions of DOE. This course is geared towards applied practitioners who may not be aware of the strengths and benefits of factorial designs. The course includes real datasets and examples from the biotechnology industry. Course participants will be able to use the lessons learned in order to design more efficient experiments in their own domains.

Outline & Objectives

Outline: The course covers fundamental design concepts and presents a simple approach to the design and analysis of multi-factor screening designs. Participants will learn how to design, conduct and analyze multi-factor experiments. No prior statistical training is assumed.

Objectives: The course will cover the following topics
- One-at-a-time vs multi-factor experiments
- Feasible space, design space and center points
- Factorial, fractional factorial, Plackett-Burman designs, and projectability

Participants will be able to design their own multifactor experiments, and will be able to analyze the data using simple techniques. The course will use the JMP Statistical software. Course participants will be able to use the 30-day free trial version of JMP.

About the Instructor

Theo Koulis obtained his PhD in Statistics from the University of Waterloo in Canada. His professional interests include: computational statistics, design of experiments, and statistical consulting. Theo is a Senior Statistician in Nonclinical Biostatistics at Genentech, Inc. supporting CMC (chemistry manufacturing and control) statistics activities. For over 7 years, Theo has supported manufacturing development at Genentech and has gained practical experience designing and implementing experiments in the biotechnology industry. Once a quarter, Theo teaches a Design of Experiments course that is geared towards specific needs of scientists and engineers working in the biotechnology industry.

Relevance to Conference Goals

The course is designed with the applied statistical practitioner in mind. The course will use real world data and examples in order to showcase the benefits of using DOEs in industry. Although the data generated from DOEs can be analyzed using simple techniques, the designed experiments can be used to generate rich and informative datasets. In addition, the course will showcase the JMP DOE toolset, which facilitates the design and analysis of DOEs.

Tue, Feb 1
2:00 PM - 5:30 PM
Virtual

SC06 - Equity and Bias in Algorithms: A Discussion of the Landscape and Techniques for Practitioners
Short Course (half day)

Instructor(s): Emily Hadley, RTI International Center for Data Science

With the growing use of algorithms in many domains, considerations of algorithmic bias and equity have far-reaching implications for society. A developing body of literature highlights the negative impact that biased algorithms can have on individual lives, while new resources provide opportunities for practicing statisticians and data scientists to better incorporate equity into our own work.

In this course, we review the landscape of equity and bias in algorithms. We take a deep dive into specific decision points related to bias and equity throughout the algorithm process, including problem framing, collecting data, completing analyses, and detecting and mitigating bias, and we discuss specific techniques that statisticians and data scientists can use to address these challenges. Attendees will evaluate tools and approaches relevant to their own work. Group discussion is a key component of this course.

Outline & Objectives

About the Instructor

Emily Hadley is a Research Data Scientist with the RTI International Center for Data Science. Her work spans several practice areas including health, education, social policy, and criminal justice. She has experience with machine learning, natural language processing, agent-based modeling, and predictive analytics, with a strong interest in antiracism, bias, and equity in data science. Emily holds a Bachelor of Science in Statistics with a second major in Public Policy Studies from Duke and a Master of Science in Analytics from NC State.

Relevance to Conference Goals

Tue, Feb 1
2:00 PM - 5:30 PM
Virtual

SC07 - Regression-Style Modeling with Variable Selection and Reduction
Short Course (half day)

Instructor(s): Clay Barker, SAS Institute / JMP Division; Ruth Hummel, SAS Institute / JMP Division

Variable Selection is a crucial step in the model building process, whether we are building a predictive model or trying to understand the results of a designed experiment. Generalized Regression modeling provides a single framework for doing interactive variable selection and fitting generalized linear models. This workshop will start with a brief overview of the generalized linear model for modeling responses that are not necessarily normally distributed. We will also introduce variable selection techniques, including stepwise methods like Forward Selection and penalized regression methods like the Lasso. We close the workshop with examples featuring both observational and experimental data and a variety of response types.

Outline & Objectives

About the Instructor

Dr. Clay Barker is a Senior Research Statistician Developer with JMP (a division of SAS) on a variety of statistical platforms in JMP, including Generalized Regression, Fit Curve and Clustering. He earned his doctorate in statistics from North Carolina State University. He holds several patents, including one for his work on implementing new visualizations for interactive model building in generalized regression.

Dr. Ruth Hummel is an Academic Ambassador with JMP (a division of SAS), supporting the technical needs of professors and instructors who use JMP for teaching and research. Dr. Hummel is a coauthor of Business Statistics and Analytics in Practice, 9th edition (2018), and has been teaching and consulting about statistics and analytics for over a decade, at the University of Florida, at the US Environmental Protection Agency, and now at SAS/JMP. She has a PhD in Statistics from The Pennsylvania State University.

Relevance to Conference Goals

Online Program

American Statistical Association