Categorical Data Analysis — Professional Development Continuing Education Course
ASA
Instructor(s): Alan Agresti, Univrsity of Florida; Ralitza Gueorguieva, Yale University
This short course surveys the most common methods for analyzing
categorical data. The first part of the course focuses on contingency
table analysis, logistic regression for binary data, logistic model
building, and loglinear models. The second part introduces logistic
models for multi-category ordinal and nominal responses and for
clustered data using generalized estimating equations (GEE) and random
effects. The presentation emphasizes interpretation rather than
technical details, with examples including social surveys and
randomized clinical trials. Examples show the use of R, with SAS and
Stata code also given for some examples.
Text Analysis for Statisticians Who Want to Become Data Scientists — Professional Development Continuing Education Course
ASA
Instructor(s): Karl Pazdernik, Pacific Northwest National Laboratory; Robin Cosbey, Pacific Northwest National Laboratory
This course will provide a broad overview of text analysis and natural language processing (NLP), including a significant amount of introductory material but with extensions to state-of-the-art methods. All aspects of the text analysis pipeline will be covered including data preprocessing, converting text to numeric representations (from simple aggregation methods to more complex embeddings), and training supervised and unsupervised learning methods for standard text-based tasks such as named entity recognition (NER), sentiment analysis, and topic modeling. The course will alternate between presentation and hands-on exercises in Python. Most examples will also be translated into R for students more comfortable in that language and support will be given for both Mac and Windows users. Attendees should be familiar with R, Python, or both and have a basic understanding of statistics and/or machine learning. Attendees will gain the practical skills necessary to begin using text analysis tools for their tasks as well as an understanding of the strengths and weaknesses of these tools.
Modern Statistical Learning for Observational Data — Professional Development Continuing Education Course
ASA, Biometrics Section
Instructor(s): David Benkeser, Emory University, Rollins School of Public Health
While clinical trials provide the highest level of evidence to compare clinical treatments or public health interventions, they are often not feasible due to ethical, logistic or economic constraints. Observational studies provide an opportunity to learn about the effect of interventions for which little or no trial data are available. These studies constitute a potentially rich and relatively cheap source of information. However, in such studies, treatment or intervention allocation may be strongly confounded by other important patient characteristics and much care is needed to disentangle observed relationships and infer causal effects. In this course, we will provide an overview of modern techniques for analyzing observational data. We will focus primarily on recent advances in the field of targeted learning, which facilitates the use of state-of-the-art machine learning tools to flexibly adjust for confounding while yielding valid statistical inference. In contrast, conventional techniques for confounding adjustment rely on restrictive statistical models and may, therefore, lead to biased inference. We will discuss methods for inference on the effect of single time-point interventions, including their benefits and limitations. We will also introduce the multi time-point extension of these methods and discuss strategies for dealing with missing data. Methods will be illustrated using data from a recent observational study conducted using electronic medical records. At the conclusion of the course, attendees should be able to link scientific questions of interest to meaningful causal parameters, and perform estimation of those parameters using modern techniques.
An Introduction to R for Non-Programmers — Professional Development Continuing Education Course
ASA
Instructor(s): William Lamberti, University of Virginia
In this one day course, participants will be introduced to the basics of R. Basic data manipulation, cleaning, and data visualization will be discussed. Learning through examples will be greatly emphasized. Participants are highly encouraged to bring their laptops (Windows, Mac, or Linux are all acceptable) for the examples done during within the course. This course is designed for individuals who have little to no experience with object oriented programming. Familiarity with programming in tools such as SAS will be helpful, but is not required. It is assumed that the baseline familiarity with data analysis tools have been primarily through a graphical user interface such as Excel.
An Outstanding Supervisor: Leading for Motivation, Innovation, and Retention - ADDED FEE — Professional Development Professional Skills Development
ASA
Instructor(s): Shanthi Sethuraman, Eli Lilly and Company
This short course will bring to life the foundational concepts for becoming the ideal supervisor. Attendees will gain a deeper understanding of the essential leadership competencies that will empower them to grow a mentee or direct report, thus enabling them, in turn, to reach their full potential as well. The rewards of this development will cascade through the organization. Participants will learn and understand the expectations and behaviors necessary for becoming a supervisor for whom employees will want to work, increasing their team productivity through an elevated level of engagement. Engagement and fulfillment of employees is achievable when they feel motivated, are challenged to be the best they can be and are able to accomplish more than they thought they could. This course will consist of lecture, videos, and interactive panel discussions where participants will hear from seasoned and successful leaders about how they have learned from their experiences and developed tips and tricks for growing their supervisory skill set. Finally, participants will learn how to measure the right outcomes for enabling sustained growth in this dimension.
It is said that employees do not leave companies, they leave supervisors. While many other leadership courses provide advice to statisticians, statistical analysts, and data scientists on how to be effective leaders, this course focuses on the critical role supervisors/professors/advisors play in their employees’ journeys to becoming strong leaders as well as individuals who propose and drive innovative ideas/solutions and effectively implement them. Strong supervisors, model desired employee behaviors, act as sponsors as well as mentors, contribute to their employees’ career satisfaction, support their employees’ work/life balance and generally retain good employees. If you are currently leading a team, managing a group, or considering a supervisory role, this course will help you be more effective. This short course is being offered in collaboration with the Leadership in Practice Committee (LiPCom) of the Biopharmaceutical section of the ASA.
Fairness in Data Science: Criteria, Algorithms, and Open Problems — Professional Development Continuing Education Course
ASA, Section on Statistics in Epidemiology
Instructor(s): Ilya Shpitser, Johns Hopkins University; Daniel Malinsky, Columbia University; Razieh Nabi, Emory University
Systematic biases present in our society influence the way data is collected and stored, the way variables are defined, and the way scientific findings are put into practice as policy. Automated decision procedures and learning algorithms applied to such data may serve to perpetuate existing injustice or unfairness in our society. Increasing commoditization of statistical and machine learning methods led to a series highly publicized instances of learning algorithms producing inappropriate, discriminatory, or otherwise harmful outputs. As a response, a flurry of research activity aimed to quantitatively describe various aspects of fairness and bias in data science, as well as develop new approaches to learning and estimation from data that takes fairness criteria into account. In this one day short course, we will review a variety of fairness criteria that have been developed, along with algorithms that aim to be ‘fairness-aware’ in various ways, with a particular emphasis on methods rooted in causal inference. We will conclude by describing a variety of methodological and translational problems that remain in this rapidly growing subfield of data science. The course assumes basic familiarity with statistical inference, maximum likelihood, basic predictive modeling (classification/regression). Some knowledge of causal inference is a plus, but not necessary.
Navigating Tough Conversations in Statistical Collaboration — Professional Development Professional Skills Development
ASA, Section on Statistical Consulting, Caucus for Women in Statistics
Instructor(s): Julia L Sharp, Colorado State University; Emily H Griffith, North Carolina State University
Statistical practitioners face difficult conversations in their interactions with their clients and collaborators. The topics of these conversations vary widely, from completion timelines to the use and interpretation of p-values. While there are no universal guidelines for navigating tough conversations, thoughtful discussion about common experiences and lessons learned; reflection on differences among individuals and situations; and exercises such as role playing can be helpful to prepare and build confidence for engaging in future tough conversations. In this course, we will build participants’ confidence to effectively communicate with clients and customers when challenging topics or situations arise. In this course, we will:
¦ Give and solicit examples of difficult conversations often encountered in statistical collaboration.
¦ Provide suggestions to approach and engage in these difficult conversations through multiple interactive activities, with a focus on leveraging participant strengths by using individual personality and skills to have these conversations in participants’ own style.
¦ Engage participants in the interactive session and learn from each other through discussion, role-playing, and conversations motivated by participants’ questions and recently produced videos portraying several difficult conversations between statisticians and their collaborators.
Caucus of Academic Representatives Chairs Workshop — Other Cmte/Business
ASA
Chair(s): Donna LaLonde, ASA
At JSM on Sunday, August 7th, the Caucus of Academic Representatives (CAR) will hold the sixteenth annual workshop for Chairs of Programs in Statistics and Biostatistics.
This half-day workshop, beginning at 8:30 a.m. ET will foster discussion between new and experienced chairs on topics of current interest.
There is no additional charge for this workshop; however, to help us plan for food, registration is required. Please register using this form
Machine Learning and Deep Learning — Professional Development Continuing Education Course
ASA, Section on Statistical Learning and Data Science
Instructor(s): Annie Qu, UC Irvine; Xiao Wang, Purdue University; Edgar Dobriban, University of Pennsylvania
This short course is for those who are new to data science and interested in understanding the cutting-edge machine learning and deep learning models. It is for those who want to become familiar with the core concepts behind these learning algorithms and their successful applications. It is for those who want to start thinking about how machine learning and deep learning might be useful in their research, business or career development. This one-day short course will provide a comprehensive overview of statistical machine learning and deep learning methods. Topics include classical methods as well as modern techniques including basic machine learning tools, supervised and unsupervised learning, deep neural network, computational algorithms and software of deep learning, and various applications in deep learning.
Practical Considerations for Bayesian and Frequentist Adaptive Clinical Trials — Professional Development Continuing Education Course
ASA, Section on Bayesian Statistical Science
Instructor(s): Peter Mueller, The University of Texas at Austin; Byron Jones, Novartis; Frank Bretz, Novartis
Clinical trials play a critical role in pharmaceutical drug development. New trial designs often depend on historical data, which, however, may not be accurate for the current study due to changes in study populations, patient heterogeneity, or different medical facilities. As a result, the original study design may need to be adjusted or even altered to accommodate new findings and unexpected interim results. Through carefully thought-out and planned adaptations, the right dose can be identified faster, patients can be treated more effectively, and treatment effects evaluated more efficiently. Reflecting the increasing importance and use of adaptive clinical trials, the International Council for Harmonisation (ICH) has recently tasked a working group to develop harmonized regulatory guidance for these studies in global drug development programs. This one-day short course introduces various adaptive methods for Phase I to Phase III clinical trials using both, frequentist and Bayesian methods. Accordingly, we introduce different types of adaptive designs and illustrate practical considerations with case studies. Types of adaptive designs covered in this course include dose escalation/de-escalation and dose insertion designs, adaptive dose finding studies, trials with blinded and unblinded sample size re-estimation as well as adaptive designs for confirmatory trials with treatment or population selection at interim.
A Practical Introduction to the Analysis of Incomplete Data — Professional Development Continuing Education Course
ASA, Biometrics Section
Instructor(s): Ofer Harel, University of Connecticut
Incomplete data is a common complication in applied research. While most practitioners are still ignoring the missing data problem, numerous books and research articles demonstrate that dealing with it correctly is very important. Biased results and inefficient estimates are just some of the risks of incorrectly dealing with incomplete data. The purpose of this course is to demonstrate the importance of dealing correctly with incomplete data; to formulate the missing data problem and to explain the best practices to deal with this problem. We therefore, will introduce incomplete data vocabulary, ad-hoc techniques (e.g. complete case analysis, single imputation), and principled procedures (e.g. maximum likelihood, Bayesian, multiple imputation) to deal with incomplete data. We will emphasize practical implementation of the proposed strategies, including discussion of software to implement procedures for incomplete data, and the advantages and disadvantages of different missing data methodologies. At the conclusion of this course, attendees should be able to understand the complications that arise from incomplete data; able to understand and state the missing data assumptions; and to analyze incomplete data. Prerequisites: course requires knowledge of standard statistical models such as the multivariate-normal, multiple linear regression, contingency tables, as well as basic maximum likelihood for common distributions.
Gaussian Process Modeling, Design, and Optimization — Professional Development Continuing Education Course
ASA, Section on Physical and Engineering Sciences
Instructor(s): Robert Gramacy, Virginia Tech
This course details statistical techniques at the interface between geostatistics, machine learning, mathematical modeling via computer simulation, calibration of computer models to data from field experiments, and model-based sequential design and optimization under uncertainty (a.k.a. Bayesian Optimization). The treatment will include some of the historical methodology in the literature, and canonical examples, but will primarily concentrate on modern statistical methods, computation and implementation, as well as modern application/data type and size. The course will return at several junctures to real-word experiments coming from the physical, biological and engineering sciences, such as studying the aeronautical dynamics of a rocket booster re-entering the atmosphere; modeling the drag on satellites in orbit; designing a hydrological remediation scheme for water sources threatened by underground contaminants; studying the formation of supernova via radiative shock hydrodynamics; modeling the evolution a spreading epidemic. The course material will emphasize deriving and implementing methods over proving theoretical properties.
Developing a System of Statistics for Environmental-Economic Decision Making for the United States — Other Cmte/Business
ASA
Chair(s): Eli Fenichel, White House Office of Science and Technology Policy (OSTP)
The United States depends on nature for economic prosperity and health. The U.S. statistical system currently does not have a core set of environmental measures, and nature is not well-reflected in the U.S. economic statistical system. The U.S. economic statistical system for measuring national economic performance came of age during and immediately following World War II, to coordinate mass production in the wake of the war. Today, our economy has changed, and our planet faces new threats from climate change—from the loss of natural resources once thought inexhaustible, to the degradation of air and water quality—making traditional economic measures less useful for communicating and understanding economic performance. The economic statistical system does not currently reflect investments in conservation or restoration of natural resources or the lost value associated with degrading nature. Therefore, a system of natural capital accounts is important for monitoring and growing the U.S. economy in a sustainable fashion, ensuring environmental conservation, and enabling firms operating in the U.S. to remain competitive.
To better align the Nation’s economic statistical measures with the current state of knowledge, on Earth Day 2022, the Biden-Harris Administration announced an initiative to develop a system of Statistics for Environmental-Economic Decision-making that maintains Natural Capital Accounts and associated Environmental-Economic Statistics. This initiative spans many federal agencies. These agencies are developing a long-term strategy for developing natural capital accounting and associated environmental-economic statistics. The group intends to finalize the strategic plan in January 2023. During this session, federal experts will discuss key elements of that plan, priorities of the effort, and important decision points. The panel, comprised of federal and non-governmental experts, will highlight important elements of a systematic and statistically rigorous approach to natural capital accounting for the United States. Federal agency experts will also discuss how the system builds on years of research. Outside experts on the panel will provide their perspectives on how to maximize the value of this new effort.
This session will be an open discussion with the statistical community about the core statistical elements of such a system, how to design a rigorous system that also meets the needs and desires of multiple uses with high quality standards, and discussion about what future research is needed to support such a system.
Participants:
Eli Fenichel (Moderator), Assistant Director for Natural Resource Economics and Accounting, White House Office of Science and Technology Policy (OSTP)
Scott Wentland, Senior Research Economist, U.S. Bureau of Economic Analysis (BEA).
Kerrie Leslie, Senior Statistician, Statistical & Science Policy Branch, Office of Information & Regulatory Affairs (OIRA), Office of Management and Budget (OMB)
Jacob Malcom, Director of the Office of Policy Analysis, U.S. Department of Interior (DOI)
Maureen Cropper, Distinguished University Professor and Chair of the Economics Department at the University of Maryland
Career Development Panel: Networking Like a Pro: A Guided Networking Session — Professional Development Professional Skills Development
ASA, Committee on Career Development, Caucus for Women in Statistics
ASA Committee on Career Development (ASA CCD) is hosting a guided networking social for students and early career statisticians to practice in a friendly environment. We will have “pro networkers” discusses various topics such as introducing yourself confidently followed by practice time. During the practice sessions, students and early career professionals will be forced to “rotate” to meet and practice with new people (volunteers from industry, government, and academia).
Scalable DNA-Protein Binding Changer Test for Insertion and Deletion of Bases in the Genome Sunyoung Shin, University of Texas at Dallas; Chandler Zuo, University of Texas at Dallas; Min Chen, University of Texas at Dallas; Yuannyu Zhang, University of Texas Southwestern Medical Center; Jian Xu, University of Texas Southwestern Medical Center; Qinyi Zhou, University of Texas at Dallas
DPQL: A Lossless Distributed Algorithm for Generalized Linear Mixed Model with Application to Privacy-Preserving Hospital Profiling Chongliang Luo, Washington University in St Louis; Md Nazmul Islam, UnitedHealth Group; Natalie E. Sheils , OptumLabs at UnitedHealth Group; John Buresh, OptumLabs at UnitedHealth Group; Martijn J. Schuemie, Janssen Research and Development; Jalpa Doshi, University of Pennsylvania; Rachel Werner, University of Pennsylvania; David Asch, University of Pennsylvania; Yong Chen, University of Pennsylvania
Density-on-Scalar Single-Index Quantile Regression Model Shengxian Ding, Department of Statistics, Florida State University; Rongjie Liu, Department of Statistics, Florida State University; Chao Huang, Department of Statistics, Florida State University
Bayesian Analytics in Practice — Professional Development Continuing Education Course
ASA
Instructor(s): Sujit Ghosh, North Carolina State University; Amy Shi, AstraZeneca
The Bayesian paradigm provides a natural and practical way for building complex analytical models by expressing the joint model through a sequence of simpler conditional models, making it useful for various hierarchical data structures. This course will first introduce general notions of Bayesian methods via hierarchical models, and then expand the topic with the more realistic and complex models which have recently emerged as a result of current Machine Learning literature. These models will be illustrated through practical applications to various real case studies avoiding much of the theoretical underpinnings. Participants with basic knowledge of probability theory and statistical inferential framework will find the course useful in expanding their toolkit with the advanced use of Bayesian analytical methods. Popular topics such as prior sensitivity analysis, model comparisons, and uncertainty quantification for machine learning methods will be covered. The concepts and methods discussed will be demonstrated using primarily R and SAS software illustrations developed by the presenters, but methodologies presented can also be carried by other software (e.g., Python). Group activities will be encouraged, allowing participants to have a hands-on experience. Lecture materials used for the workshop will be distributed electronically and thus can also be offered virtually.
Evidence-Based Approach in Pediatric Drug Development: Progress and Lessons Learned — Professional Development Continuing Education Course
ASA, Biopharmaceutical Section
Instructor(s): Satrajit Roychoudhury, Pfizer Inc; Margaret Gamalo , Pfizer Inc.; Robert ‘Skip’ Nelson, Johnson & Johnson
Evidence-based medicine is required for adults and nothing less so for the pediatric population. Pediatric trials are unique for several reasons: small number of patients, limited physiologic data, and ethical complexity increase the difficulty and costs of pediatric trials. Extrapolation of adult or other pediatric data has facilitated the conduct of pediatric product development trials, subsequent marketing approval, and labeling. This approach reduces the number of children that need to be enrolled and the type of clinical trials that need to be conducted for pediatric product marketing approval.
During the first part of the short course, we will introduce the general scientific framework of extrapolation and provide overview of available design and analysis approaches. We’ll elaborate on the use of Bayesian hierarchical model (BHM) and discuss the practicality of the underlying assumptions associated with it. Finally, we’ll discuss extensions of BHM to handle possible deviations from underlying assumptions and implementation using the R.
Later, we will focus on the regulatory aspects and case studies. A real-life case study will be included to illustrate the practical implementation and regulatory hurdles. The aim of the short course is to enable participants to apply the extrapolation techniques themselves in the real-life trial.
Causal Effects and Their Estimation: A Practical Workflow, from Planning to Application — Professional Development Continuing Education Course
ASA, Section for Statistical Programmers and Analysts
Instructor(s): Clay Thompson, SAS; Michael Lamm, SAS; Yiu-Fai Yung, SAS
When does an effect estimate have a causal interpretation and which effect has an interpretation appropriate for your question? This course provides an overview of causal inference that is designed to answer these types of practical questions when data from an observational or nonrandomized study are analyzed. It describes the differences between possible choices for causal estimands, tools for analyzing a data generating process, and statistical methods that support valid effect estimation. It reviews the definition of causal effects in a potential outcomes framework, discusses estimates for total effects, and describes the decomposition of effects through causal mediation analysis, with an emphasis on dichotomous treatments. Directed acyclic graphs (DAGs) are presented as a tool for representing a data generating process, reasoning about possible data generating processes, and constructing valid estimation strategies. For the estimation of treatment effects, this course discusses the appropriate use of propensity score methods, doubly robust methods, and a regression approach to causal mediation analysis. This course provides a review of the theory behind these methods and then focuses on illustrating their application with examples that use SAS/STAT® software. This material demonstrates a rigorous workflow for causal effect estimation. No prior experience with the methods is assumed.
Practical Solutions for Working with Electronic Health Records Data — Professional Development Continuing Education Course
ASA, Section on Statistics in Epidemiology
Instructor(s): Rebecca Hubbard, University of Pennsylvania; Yong Chen, University of Pennsylvania
This short course will introduce participants to the basic structure of EHR data and provide a practical set of tools to analyze this rich data resource through a combination of lecture and hands-on exercises in R. The first part of the course will cover issues related to the structure and quality of EHR data, including data types and methods for extracting variables of interest; sources of missing data; error in covariates and outcomes extracted from EHR data; and data capture considerations such as informative visit processes and medical records coding procedures. In the second half of the course, we will discuss statistical methods to mitigate data quality issues arising in EHR, including missing data, error in EHR-derived covariates and outcomes, and data integration across multiple clinical practices. R code will be provided for implementation of the presented methods, and hands-on exercises will be used to compare results of alternative approaches. This short course is of interest to researchers without prior experience working with EHR data as well as more experienced individuals interested in learning practical solutions to some common analytic challenges.
An Introduction to Item Response Theory — Professional Development Continuing Education Course
ASA
Instructor(s): Brian Leventhal, James Madison University
Item response theory is a class of models that describes the interaction between persons and questions on a test or survey. These models are used to relate behavior to constructs with applications in test development, item banking, equating, computer adaptive testing, and behavioral and psychological measurement. This course will cover the basic tenets and concepts of item response theory with the goal to have attendees develop an understanding of dichotomous models, polytomous models, and be introduced to multidimensional item response theory. The instructor will deliver content using a mix of lecture, discussion, engaging interactive examples, and meta cognition checks with formative and summative feedback. The course assumes knowledge of basic terminology such as parameter vs statistic but will be taught at an introductory level. Some familiarity with concepts of basic calculus, logistic regression, and maximum likelihood estimation may be helpful, but are not required.
Introduction to Bayesian Methods for Clinical Trial Design and Sample Size Determination — Professional Development Continuing Education Course
ASA
Instructor(s): Matthew A. Psioda, University of North Carolina at Chapel Hill; Joseph G Ibrahim, University of North Carolina
This course is designed to give statisticians with experience in clinical trials research a comprehensive overview of the use of Bayesian methods for trial design and on implementation using standard software. Applications will be demonstrated using R, SAS or both. Part I will give an overview of Bayesian sample size determination with a focus on fixed sample size trials in the phase II/III setting. Focus is paid to four concepts: (1) sampling priors that reflects knowledge about parameter(s) in the data model, (2) fitting priors used to analyze data, (3) Bayesian sample size determination (SSD) criterion, and (4) monitoring strategies. For (3), a review of Bayesian criterion for SSD will be given (e.g., Bayesian power, average coverage criterion). For (4), multiple strategies will be discussed for monitoring (e.g., predictive probability of success, sequential methods). Part II will focus on advanced Bayesian designs that incorporate information borrowing. The types of designs considered fall into two broad categories: (1) designs that borrow information via an informative fitting prior specified a priori based on one or more historical datasets (e.g., pediatric trials that extrapolate from adult trials), and (2) designs that seek to borrow information across subgroups within a trial (e.g., basket trials).
Statistical Methods in Finance with R — Professional Development Continuing Education Course
ASA
Instructor(s): Rituparna Sen, Indian Statistical Institute
The course is intended for an audience trained in statistics who are interested in getting into quantitative finance. Starting with basic description of financial data, several applications are covered. These include asset pricing, option pricing, credit scoring and risk management. The methods are illustrated with data from real markets. Codes are provided in R to execute the methods. The course has a broad coverage as well as mathematical rigor. References to related work is provided to aid the reader to pursue areas of specific interest in further detail.
Pre-requisites: Training in statistics and probability including basic descriptive and inferential statistics, regression, multivariate analysis, time-series analysis and stochastic processes. Basics of statistical learning are desirable. Students should be comfortable with R programming. No prior exposure to finance is required.
Statistical and Computational Methods for Microbiome and Metagenomics Data Analysis — Professional Development Continuing Education Course
ASA, Section on Statistics in Genomics and Genetics
Instructor(s): Curtis Huttenhower, Harvard T.H. Chan School of Public Health; Hongzhe Li, University of Pennsylvania
High throughput sequencing technologies enable large-scale individualized characterization of the microbiome composition, functions and community dynamics. The human microbiome, defined as community of microbes in and on the human body, impacts human health and risk of disease by dynamically interacting with host diet, genetics, metabolism and environment. The resulting microbiome data together with genomics and metabolomics data can potentially be used for personalized diagnostic assessment, risk stratification, disease prevention and treatment. New computational and statistical methods are being developed to understand the function of microbial communities by integrating microbiome and other omics data. In this short course, we will give detailed presentations on the statistical and computational methods for measuring various important features of the microbiome based on shotgun metagenomic sequencing data, and how these features are used as an outcome of an intervention, as a mediator of a treatment and as a covariate to be controlled for when studying disease/exposure associations. The statistics underlying some of the most popular tools in microbiome data analysis will be presented, including bioBakery tools for meta’omic profiling and tools for microbial community profiling (MetaPhlAn, HUMAnN, Data2, DEMIC, etc), together with advanced methods for compositional data analysis and kernel-based association analysis.
Best Practices in Project Management and Quality for Statisticians and Data Scientists — Professional Development Continuing Education Course
ASA, Section on Statistical Consulting
Instructor(s): Michiko Wolcott, Msight Analytics
If you work on analysis projects using real-life data, you face a variety of challenges. Many of these challenges are not strictly statistical yet are often unique to projects in applied statistics and data science. They include issues with the quality of source data, difficulties in planning and scoping due to the unknowns, challenges with project expectations and timelines, balancing replicability and reproducibility with project fluidity, among others.
In this workshop, we present best practices and methodologies to address key challenges in the non-statistical aspects of our work: project management and delivery, project quality, and data quality. We discuss the application of project management best practices to statisticians and data scientists and how it relates to the quality of projects. We then translate broadly recognized quality ideas to statistical practice itself. Finally, we provide an overview of industry practices in data quality and management and present a methodology for ensuring the quality of data used for analysis.
The workshop is software-agnostic and no specific background is assumed. Participants are encouraged to “bring” his/her own projects (data, computing environment, scripts, etc.) to use as case studies for themselves while following along (project details and data need not be shared).
Introduction to Process Mining — Professional Development Continuing Education Course
ASA
Instructor(s): Yoann Valero, Université de Technologie de Troyes / Livejourney; Frédéric Bertrand, Troyes Technology University
This course is aimed at introducing the basis of Process Mining:
from process event logs, Process Mining discovers the true underlying
process generating the data. The resulting process model may then either be compared to a pre-existing theoretical model through conformance
checking, or evaluated on its own with multiple metrics regarding
fitness to data and model complexity, allowing for its in-depth analysis
and optimization.
After explaining the terminology pertaining to processes and event logs,
the course will first teach basic sequence analysis, Process Maps and
Petri Net representations. It will then move on to Process Discovery
with the explanation of the Heuristics Miner algorithm, followed by multiple
process model evaluation metrics. All notions will be illustrated via
a practical session with the R language.
Basic knowledge of Lagrange multipliers, Mathematical Logic and Information
Theory is needed to follow this course. In terms of programming
skills, a good grasp of the R language is required.
Julia for Data Science and Statistical Computing — Professional Development Continuing Education Course
ASA, Section on Statistical Computing
Instructor(s): Hua Zhou, UCLA; Josh Day
Julia (http://julialang.org) is a modern open source programming language for technical computing. Its design offers much greater speed and productivity compared to R or Python, as high-performance code does not need to be wrapped in a low level language like C or Fortran. After almost a decade of active development, Julia reached its first major release v1.0 on Aug 8, 2018 and is quickly gaining popularity in the communities of scientific computing and data science. This course comprises two parts. The first part introduces the Julia package ecosystem for data science, including data ingestion and cleaning, visualization, out-of-core processing, model fitting, and general analytics. The second part covers statistical computing using Julia. It begins with a comparison between Julia, R, and Python, and continues with a tutorial on using Julia for numerical linear algebra, numerical optimization, parallel/distributed computing, and GPU computing. Presenter Dr. Hua Zhou from UCLA has extensive experience in teaching statistical computing and Julia in university classrooms and conference venues. Presenter Dr. Josh Day from Julia Computing is a core developer of the JuliaDB and OnlineStats packages.
Spatial Modeling and Visualization Using R-INLA: Applications in Disease Risk Mapping and Species — Professional Development Continuing Education Course
ASA
Instructor(s): Paula Moraga, King Abdullah University of Science and Technology
In this course, we will learn how to develop spatial models using the R-INLA package to estimate disease risk, quantify risk factors, and predict species distributions. We will also learn how to create data visualizations such as interactive maps, and introduce presentation options such as interactive dashboards and Shiny web applications that facilitate the communication of insights to collaborators and policy makers. We will work through several fully reproducible examples of disease mapping and ecology applications using real-world data such as cancer in the USA, malaria in The Gambia, and sloths in Latin America. The course materials are drawn from the book 'Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny' by Paula Moraga (2019, Chapman & Hall/CRC). URL: https://www.paulamoraga.com/book-geospatial/. It is assumed participants are familiar with R and it is recommended a working knowledge of generalized linear models.
End-to-End Modeling and Machine Learning with SAS Viya for Learners — Professional Development Computer Technology Workshop
ASA, SAS
Instructor(s): Jacqueline Johnson, SAS Institute
The workshop will demonstrate use of SAS Visual Analytics and SAS Model Studio within the SAS Viya for Learners platform. SAS Viya for Learners is a suite of free, cloud-based software for teaching and learning data science skills available to academic faculty and students. Participants will be introduced to several methods to train supervised machine learning models to make better decisions on big data. Topics that will be demonstrated include predictive modeling techniques such as linear and logistic regression, decision tree and ensembles of trees such as forest and gradient boosting, neural networks, and support vector machines. A case study example will be used to guide participants through steps of an analysis including data preparation, model building, and model assessment. A general introduction and overview to the SAS Viya interface, including SAS Drive, will also be presented. Familiarity with predictive modeling techniques is helpful, but not necessary.
Creating Reproducible Reports and Customized Tables with Stata — Professional Development Computer Technology Workshop
ASA, StataCorp
Instructor(s): Kristin MacDonald, StataCorp
Effectively reporting results is a crucial step in statistical analyses. Whether you are computing summary statistics, performing survival analysis, fitting multilevel models, performing Bayesian analysis, or using any of Stata's other statistical analysis features, you will want to report the results of your analysis. Stata's tools for creating customized tables, dynamic documents, and reproducible reports allow you to perform your analyses and create reports at the same time. This workshop will first introduce Stata's commands for creating and customizing tables of results and exporting those tables to Word, Excel, LATEX, PDF, Markdown, HTML, and other formats. We will then demonstrate how to produce complete reports with formatted text, tables of statistical results, graphs, and more. We will see two workflows for creating reports—using Markdown to create HTML and Word documents and using Stata commands to create customized Word, Excel, and PDF documents. Knowledge of Stata is helpful but not required.
Clustering with SAS Software — Professional Development Computer Technology Workshop
ASA, SAS
Instructor(s): David Kessler, SAS
Clustering is the process of discovering groups within unlabeled data. For example, clustering based on limited and indirect information could identify individuals at high risk of a disease or potential customers for a new product. The investigator might also be interested in the strength of the distinction between clusters. Some clustering techniques are computationally expensive, and the investigator might need approximation techniques for larger data sets. There are many clustering methods that can serve the investigator’s goals while respecting practical and theoretical constraints. This workshop introduces several of these methods and approaches to clustering as they are implemented in SAS software, including k-means clustering, Gaussian mixture models, and hierarchical clustering. The workshop also illustrates techniques of estimation, model fitting, and scoring, and discusses the expectation-maximization, nearest-neighbors, and variational Bayes approaches. Finally, it demonstrates the advantages and limitations of these techniques in different applications. Attendees should have a basic familiarity with estimation. At the conclusion of the workshop, attendees will have a broad understanding of clustering techniques and will be able to use a variety of SAS procedures and products to apply these techniques.
Survival Analysis of Interval-Censored Event-Time Data in Stata — Professional Development Computer Technology Workshop
ASA, StataCorp
Instructor(s): Xiao Yang, StataCorp
This workshop covers the use of Stata to perform survival analysis of interval-censored event-time data. In survival analysis, interval censoring occurs when the event time is not exactly observed but is only known to lie within some time interval. Survival data that contain a mixture of uncensored, right-censored, left-censored, and interval-censored observations are called interval-censored event-time data, and these data arise in many areas, including medical, epidemiological, economic, financial, and sociological studies. Ignoring interval-censoring will often lead to biased estimates. The course will provide a brief introduction to interval-censored data and will demonstrate how to fit parametric survival models and the semiparametric Cox model for interval-censored data in Stata. How to interpret results and plot the survivor function will be discussed with accompanying examples. Also, a number of examples demonstrating how to graphically evaluate goodness of fit and how to graphically check the proportional-hazards assumption will also be presented. No prior knowledge of Stata is required, but basic familiarity with survival analysis will prove useful.