For longitudinal data, mixed models include random subject effects to indicate how subjects influence their responses over the repeated assessments. The error variance and the variance of the random effects are usually considered to be homogeneous. These variance terms characterize the within-subjects (error variance) and between-subjects (random-effects variance) variation in the data. In studies using Mobile Health measurement modalities like Ecological Momentary Assessment (EMA), up to thirty or forty observations are often obtained for each subject, and interest frequently centers around changes in the variances, both within- and between-subjects. Also, such EMA studies often include several waves of data collection. In this workshop, we focus on an adolescent smoking study using EMA at both one and several measurement waves, where interest is on characterizing changes in mood variation associated with smoking. We describe how covariates can influence the mood variances, and also describe an extension of the standard mixed model by adding a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses. Additionally, we allow the location and scale random effects to be correlated. These mixed-effects location scale models have useful applications in many research areas where interest centers on the joint modeling of the mean and variance structure. Computer application using SAS NLMIXED and the freeware MIXREGLS program will be described and illustrated.
Objectives: Estimation of causal effects is a primary goal in most health policy research, e.g., to assess the impact of a policy change on patient outcomes. When controlled experiments are infeasible, analysts must rely on observational data in which treatment (or exposure) assignments are out of the control of the researchers. Attendees will gain hands-on experience and guidance for estimation of causal effects using inverse probability of treatment weights (IPTW) in observational studies. We will also provide guidance on how to implement omitted variable sensitivity analyses. We will provide an introduction to causal modeling using the potential outcomes framework and the use of IPTW to estimate causal effects from observational data. We will also present step-by-step guidelines on how to estimate and perform diagnostic checks of the estimated weights for: (1) settings with 2 treatment groups of interest and (2) settings where treatments are continuous. Additionally, the workshop will provide an overview on how to implement omitted variable analyses, which are critical to any analysis using IPTW since the robustness of causal effects from an IPTW analysis depends on there being no unobserved covariates. Attendees will gain hands-on experience estimating each type of weight using machine learning methods as well as in how to estimate the causal effects of interest using the IPTW. Code will be shared for R, SAS and Stata. Attendees should be familiar with linear and logistic regression.
Abstract: Observational studies provide opportunities to learn about the effect of policy interventions for which little or no trial data are available. However, in such studies, treatment or intervention allocation may be confounded and care is needed to disentangle observed relationships and infer causal effects.
We provide an overview of modern techniques for analyzing observational data with primary focus on the field of targeted learning, which facilitates the use of machine learning tools to flexibly adjust for confounding while yielding valid statistical inference. We will discuss methods for comparative effectiveness studies for single time-point interventions, introduce the multi time-point extension of these methods, and discuss strategies for dealing with missing data. Methods will be illustrated using data from recent observational studies and extracted from electronic medical records.
The course is geared towards researchers with some experience in data analysis and statistics. A basic understanding of confounding, probability (e.g., distribution of a random variable, its mean/variance), confidence intervals, and regression (linear and logistic). Advanced knowledge of these topics is useful, but not necessary.
The stepped wedge cluster randomized design is a relatively new type of cluster randomized design that has seen a rapid increase in popularity over the past decade. In this design all clusters usually start the trial in the control condition and end in the intervention condition; clusters cross from control to intervention sequentially and in order determined by randomization, while outcomes are observed repeatedly through time on each cluster. The stepped wedge design is potentially useful for evaluating health policy and services interventions rolled out in real world settings. While the stepped wedge design can achieve greater power than a parallel arm cluster trial and may facilitate cluster recruitment, it has numerous methodological complexities which need to be considered in its design, analysis and reporting. The design is also vulnerable to additional risks of bias compared to parallel arm designs. Most importantly, the stepped wedge design must always account for time to avoid confounding of the intervention effect with secular trends and must account for both within-period and between-period intracluster correlations to obtain correct standard errors. In this workshop we will review the rationale and unique characteristics of the stepped wedge cluster randomized design, consider its implications for sample size calculation and analysis, and discuss its strengths and weaknesses compared to traditional designs. We will consider sample size calculation and analysis procedures for both cohort and cross-sectional designs, as well as complete and incomplete designs. We will review special requirements for transparent reporting. Emphasis will be on application; different types of stepped wedge designs will be described with examples in health policy and services research.
Different providers and health insurance systems create vast amounts of health information. However, opportunities to use this information for PCOR\CER are often missed because this information cannot be combined due to privacy regulations. Record linkage is a powerful tool that enables researchers to link data from two or more sources when unique identifiers such as social security numbers are not available. The resulting linked databases allow researchers to leverage preexisting information to perform a vast array of rich PCOR\CER analysis.
Record linkage works well when there are many linking variables, but linking with limited identifying information, as is common in PCOR\CER, is more difficult and may suffer from linkage errors. Statistical analysis can be adversely affected by incorrectly linked records, where small number of incorrectly linked records could lead to large biases in estimation. Some statistical methodologies have been proposed for adjusting for possible linkage errors when estimating marginal and conditional correlations. However, these methods are not readily available and may rely on assumptions that are invalid in PCOR\CER studies. For example, some methods assume that the probabilities of errors in linkage are known or can be estimated from the data. Another assumption is that the linkage is non-informative, which means that the errors in linking are not correlated with the outcomes given the covariates. Lastly, these methods have not been applied to estimate causal effects from linked observational data.
This workshop describes the limitations of current methods and compares them to recently proposed methods. Throughout, we display the implementation of the different methods in health-related studies using statistical software. The workshop is intended for health services researchers and statisticians who are interested in estimating correlations and causal relationships with linked data sources.
Probability sampling has been the standard basis for design-based inference from a sample to a target population. In the era of big data and increasing data collection costs, however, there has been growing demand for methods to use other types of data, e.g. administrative data, data from opt-in panels, etc. Additionally, there may be a need to combine these data with a small probability sample. This has the potential to improve the cost efficiency of survey estimation without loss of statistical accuracy. Given potential bias and coverage error inherent in non-probability samples, use of traditional weighted survey estimators for data from such surveys may not be statistically valid.
We will discuss some of the benefits and disadvantages associated with probability and non-probability sample data, and some of the methods suggested for utilizing non-probability sample data. These methods include:
A. Calibration; B. Propensity-based methods; C. Superpopulation modeling; and D. Statistical matching
We will discuss examples using each of the above methods and provide opportunities for participants to implement methods using simulated data.
Participants will learn: 1) approaches to utilizing nonprobability samples, 2) approaches to combining probability and nonprobability samples, and 3) ideas for assessing a nonprobability sample’s bias.
What exactly is deep learning? How does it differ from inferential statistics or machine learning approaches to predictive model development? Most importantly, when might I use it in health applications? The popular media has touted AI/deep learning as the future of big data analytics, yet many applied statisticians have not been trained in deep learning methods. This workshop will give a practical introduction to the fundamental concepts of deep neural networks that underlie the notion of deep learning/AI, with hands on applications using Python and TensorFlow. We will cover core concepts in NN including nodes, activation functions, regularization, back and forwards propagation, and gradient descent optimization. We will cover different NN architectures with examples including artificial neural networks, feedforward networks, recurrent neural networks, and convolutional neural networks. We will close the course with a review of successful applications of neural networks in healthcare to connect this applied learning with state-of-the-art published successes. Students will be able to follow along and run code on their own laptops using an open source environment built specifically for the workshop (virtual machine). We will introduce students to the free Google Collaboratory for deep learning as well. This workshop resonates with the conference theme by connecting the abstract ideas of AI/Deep learning (a bleeding-edge method to deal with complex data relationships) to tangible applications in healthcare.
Heterogeneity of treatment effect (HTE) is said to be present when the effect of a treatment varies across patient subpopulations that can be defined by observable patient characteristics. Assessing the presence and extent of heterogeneity of treatment effect is an important component of evaluating the consistency of new treatments. In this course, we plan to provide an overview of both traditional and more recent methods for analyzing and reporting HTE. We will first cover traditional approaches to subgroup analysis and provide guidance for interpreting their results. Among the topics to be discussed in this portion of the course include: modeling treatment-covariate interactions, hypothesis testing and controlling for multiplicity, and examining qualitative interactions. We will then describe Bayesian hierarchical models for performing subgroup analysis, discuss their interpretation, and discuss both choice of priors and model diagnostics. Finally, we will describe more recently developed Bayesian machine learning methods, and detail their use in quantifying individualized treatment effects. Each of the methods presented in this course will be accompanied by a demonstration of the available software.
Most of the data used to inform health policy decisions come from observational studies such as large scale surveys conducted by federal, state and local governments and agencies. Examples include the National Health Interview Survey (NHIS), Behavioral Risk Factors Surveillance Survey (BRFSS) or National Immunization Survey (NIS). Because these survey data violate the i.i.d. assumptions of standard statistical methods, they require special analysis methods – the topic of this workshop.
This workshop provides a crash course in complex survey data analysis for health researchers, statisticians and specialists who need to analyze health data collected through complex survey designs. (If the study design description contains keywords like "multistage sampling", "random digit dialing", "nonresponse adjustment" or "final weights", it is a complex survey data set.) The workshop will highlight the issues associated with complex survey data for researchers who have had no exposure to the topic. If you took courses in sampling or survey data analysis, and you remember the material well, this workshop will have limited benefit for you. If you are using weighted survey data, but not entirely sure how the weights were created, or whether to run weighted or unweighted analyses, or are confused about the meaning of your results and what to report -- this is the right workshop for you.
1. Examples of complex health survey data. Survey design trade-offs: frames, coverage, modes and costs.
2. Features of complex surveys: weights, clusters, strata, nonresponse adjustments, and their impact on estimates and standard errors.
3. Available software: R, Stata, SAS, SUDAAN. Syntax specification basics.
4. Fitting statistical models with survey data.
5. Survey data quality control: nonresponse biases, coverage biases, mode effects.
Social network data are complex with many subtleties while social network analysis (SNA) is an emerging area in statistics and other fields. This workshop will provide an overview of the key types of social network data with examples drawn from medicine and allied fields. The emphasis will be on the statistical methods used for analyzing network data in each situation and the statistical challenges confronting the ability to draw reliable statistical inferences. Because many questions involving society and organizational structure or relationships may be represented by networks, SNA has tremendous potential to advance these fields in novel ways. The workshop will consist of three parts: (i) Introduction to different forms of network data and descriptive measures of networks or of an actors structural position within them including the incorporation of these into statistical analyses of networks to determine their relationship to other variables of interest; (ii) Relational models in which the network itself is a multivariate dependent variable; (iii) Models or analyses in which networks are fundamental to the construction of explanatory variables, including models to estimate peer effects and to study diffusion. The workshop will be at a level that relies on a general as opposed to an in depth knowledge of statistics.
In recent years, a range of new weighting methods have been developed for comparative effectivness research, overcoming the limitations of the traditional inverse probability weighting (IPW) methods. This course will cover the general class of ``balancing weights’’ methods that is readily adaptive to specific study goals (Li, Morgan, Zaslavsky, 2018). In particular, we will focus on the overlap weights, which offers several important statistical and clinical advantanges. We will discuss its use in (1) generic binary cross-sectional treatment, (2) multi-valued treatments, (3) subgroup analysis, and (4) covariate adjustment in randomized trials. Practical issues of implementation, such as propensity score modeling, balance check, augmented estimation and variance estimation, and associated software package will be discussed. Connection to other popular recent methods such as the covariate-balacing propensity score, stablized balancing weighting, entropy balancing will also be presented. All the methodologies will be illustrated using real world examples in medicine and health policy.
Data analysts tend to write a lot of reports, describing their analyses and results, for their collaborators or to document their work for future reference. When we first start out, we often write an R script with all of the work, and would just send emails to collaborators, describing the results and attaching various graphs. In discussing the results, there often can be confusion about which graph was which.
Moving to writing formal reports, with Word or LaTeX, there is still much time spent on getting the figures to look right. Mostly, the concern is about page breaks and generating reproducible results. Imagine the work that has to be done to find the right analysis code to fix a problem in a report generated 4 years ago on an old data set, that you hope you can still find.
Ideally, such analysis reports are reproducible documents: If an error is discovered, or if some additional subjects are added to the data, you can just re-compile the report and get the new or corrected results (versus having to reconstruct figures, paste them into a Word document, and further hand-edit various detailed results).
This workshop will walk you through a key package in R called knitr, that is the leading solution to these types of reports. It allows you to create a document that is a mixture of text and chunks of code. When the document is processed by knitr, chunks of code will be executed, and graphs or other results inserted into a professional looking final document. Reports can be created in many formats such as Word, PDF or as HTML webpages, and are highly customizable.
Prior knowledge of R is helpful, but not necessary.
Linkedin is great, your department or office website may have a bio on a page for you, but you need your own space to share your work. To demonstrate your talent, share recent projects or research, create and curate scientific content. Share your course lecture notes, blog about your recent research, or present analysis results in all their grisly detail as a supplement to a presentation or manuscript. This hands-on workshop will walk you through the process of creating two types of websites with no knowledge of HTML or CSS needed. The first type is a simple site that links a series of web pages you create using the Markdown language together into a website framework. This is ideal for a small project, such as presenting class materials, or an interactive dashboard. The second type of website is ideal for users who wish to write a blog or present a more “modern” feel to their website. This website uses the website generator Hugo, but again no knowledge of Hugo will be necessary. We will use the R studio environment to build these websites using Markdown, and demonstrations of how live code and output can be shown in these webpages, but no direct knowledge of R is required. Both methods require knowledge of version control and use of github.
This COMPASS science communication training will help participants share what they do, what they know—and most importantly, why it matters—in clear, lively terms. Grounded in the latest research on science communication, this training is designed to help participants find the relevance of their science for the audiences they most want to reach—journalists, policymakers, the public, and even other scientists.