Viewing session type: Tutorial
Back to search menu
Saturday, February 22
Sat, Feb 22
2:00 PM - 4:00 PM
Regency A
T1 - Applied Use of R, GitHub, and Markdown for Reproducible Workflows for Small Data Teams
Tutorial
Many organizations have limited personnel and resources available to building efficient data workflows. As organizations grow, having solid documentation of processes, reproducible analyses and systemic collaboration tools are essential for maintaining efficient workflows.
This tutorial will walk through setting up documentation and reproducibility using R, Github and Markdown for emerging data scientists and small data teams. Participants will learn best practices for documentation and collaboration, and essential elements for reproducibility via hand-on training in RStudio.
Following this session, participants will have the tools to return to their organizations ready to build reproducible, documented data workflows.
Outline & Objectives
Students will obtain the following hands-on skills:
1. Foundational understanding of why documentation and reproducibility are important.
2. Setup and installation of required software to build workflows in RStudio, Github and documentation in R-Markdown.
3. Understand the necessary components of reproducibility, including:
a. Identified data sources
b. Clear workflows and timelines
c. Version control and code
4. Understand the necessary components of documentation, including:
a. Metadata
b. Building organizational best-practices
c. The fundamentals of useful commenting
d. Combining narrative, code and documentation
e. Organizational transparency
Following this session, participants will have the tools to return to their organizations ready to build reproducible, documented data workflows.
About the Instructor
Dr. Karin Neff is the Data and Assessment specialist for Bozeman Public Schools where she works in a data-team of one to build data stories to aid in student growth and achievement. Dr. Neff relies heavily on open source tools to maintain analytic integrity and reproducibility in the public sector. Dr. Neff received her doctorate in Ecology and Environmental Sciences from Montana State University where she helped develop laboratory best practices, contributed to documentation strategies and mentored emerging scientists.
Relevance to Conference Goals
This course will provide an opportunity for emerging analysts to establish best practices in reproducibility and documentation that will serve them for their entire careers. It will also provide tools and information for organizations with small data teams to build workflows that will scale as their organizations and analytic needs grow.
Sat, Feb 22
2:00 PM - 4:00 PM
Regency C
T3 - Project Management Principles for Statisticians
Tutorial
Project Management Institute (PMI) indicated that: - 58% of organizations fully understand the value of project management - 93% of organizations report using standardized project management practices - 68% of organizations in PMI’s annual survey said that they used outsourced or contract project managers in 2018 - 23% of organizations use standardized project management practices across the entire organization - 33% use standardized practices, but not across all departments - 7% of organizations don't use any standard practices at all
Outline & Objectives
Scope: The goal of this workshop is to demonstrate how to apply the basic principles of the Project Management Institute's Body of Knowledge (PMBOK) the workplace.
Objectives:
? Learn the basic PMBOK templates, such as charter, project plan, budget, risk management, and presentation;
? Understand how to use the basic PMBOK templates using Google drive;
? Draft a charter, project plan, budget, risk management, and presentation on Google.
Benefits:
? Understand the principles of project management based on the PMBOK
? Learn how to apply basic project management tools such as project charter, project management plan, and risk management plan; and
? Draft a presentation for managers.
Level: Basic
Software: Google drive
About the Instructor
Ana Valentín serves as an Enterprise Service Program Manager for the Enterprise Service Branch in the Service Delivery Division under National Oceanic and Atmospheric Administration (NOAA) Office of Chief Information Officer. In this capacity, Ana leads various teams of technology projects strengthen NOAA’s Mission. Ana promotes diversity and inclusion through the Latinos@NOAA Employees Resource Group (ERG) an organization that she co-founded on 2014 and recipient of the 2018 NOAA’s Administrator Award. Ana taught undergraduate statistics and math courses and a graduate clinical research course for six years. Ana also had published research articles and has been presenting at the League United Latin American Citizens Federal Training Institute national conferences professional development workshops. Ana has a BA and MPH from the University of Puerto Rico, a MS from University of Fairfax, and graduate certificates from: George Washington University, University of Maryland University College, and the United States ARMY War College. In her spare time, Ana collaborates with various non-profit, while pursuing a D.Sc. on Cyber-security from Marymount University in Virginia.
Relevance to Conference Goals
Relevance to Conference Goals:
? Better communicate and collaborate with their clients and customers
? Have a positive effect on their organization or enhance their professional
Sat, Feb 22
2:00 PM - 4:00 PM
Regency D
T4 - Introduction to Bayesian Data Analysis
Tutorial
Instructor(s): An-Ting Jhuang, UnitedHealth Group R&D; Christina Phan Knudson, University of St. Thomas
Download Handouts
This short course introduces Bayesian statistics at a level appropriate for all practitioners in both academia and industry. This two-hour course introduces fundamental Bayesian concepts, model creation, diagnostics, and interpretation of results.
Examples and sample code will develop participants’ intuition and practical abilities. Learners will understand the differences between frequentist statistics and Bayesian statistics; explain the importance and use of priors, posteriors and likelihoods; understand the use and function of Markov chain Monte Carlo (MCMC) methods; write R code to create Bayesian models; examine convergence of posterior samples; and integrate results into decision-making.
Participants will implement these skills with several examples using practical models (linear regression and logistic regression) with real-world data sets. This workshop will broaden participants’ skill-sets for solving real-world problems.
Outline & Objectives
1. Intro to Bayesian concepts
2. Examples: coin flip, linear regression, logistic regression
3. Interpreting results
4. Conjugate priors
5. MCMC samplers: Why do we need them? How do they work?
6. MCMC convergence: definition, intuition, diagnostics, R code, packages
7. Larger exampler with survival data
--Basic goal: create a logistic regression to model the log odds of survival based on various predictors (e.g. gender, fare class, adult vs child)
--Intermediate: prediction
--Advanced: evaluate impact of the prior distribution, the Monte Carlo sample size, the inclusion/exclusion of variables
8. Review
Goal: introduce participants to the Bayesian statistical framework. Participants will understand and gain hands-on experience with priors, likelihoods, and posteriors; Markov chain Monte Carlo (MCMC) samplers; MCMC convergence; and the basic Bayesian workflow.
About the Instructor
An-Ting Jhuang holds a PhD in statistics from North Carolina State University. She has developed new Bayesian methods to tackle problems in epidemiology and material science. Her research focuses on sparse signal detection in spatial and spatiotemporal statistics, and exposure assessment. She is a Principal Data Scientist at UnitedHealth Group Research & Development in Minnesota. On a day-to-day basis, she identifies research directions and applies statistical methods to solve scientific and business questions in the health-care field.
Christina Knudson holds a PhD in statistics from the University of Minnesota. She is an assistant professor at the University of St. Thomas in Minnesota. She is the author and maintainer of the R package glmm, which is downloaded from CRAN over 1000 times per month. Her most recent contribution is “Revisiting the Gelman-Rubin Diagnostic” (Vats and Knudson), which stabilizes the Gelman-Rubin (GR) statistic, proposes a principled GR threshold for terminating samplers, and connects effective sample size to the GR statistic. Additionally, she is the organizer of the Twin Cities chapter of R Ladies.
Relevance to Conference Goals
Our goal of jump-starting participants’ Bayesian statistics abilities directly aligns with the conference goal of providing participants with the opportunity to learn new statistical methodologies and best practices in statistical analysis. We have designed this short course to broaden applied statisticians skill sets so that they can better consult with and aid customers and organizations solve real-world problems. Our short course will teach statistical techniques that participants can apply to their jobs as applied statisticians; participants will leave the workshop having practiced with several examples using practical models (linear regression and logistic regression) with real data sets.