Online Program

Utilizing External Validation Data with Bayesian Data Augmentation and Variable Selection to Adjust for Confounding

*Joseph Antonelli, Harvard University 
Francesca Dominici, Harvard School of Public Health 

Keywords: Causal inference, Bayesian methodology, Bayesian data augmentation, Imputation

Large administrative databases are becoming increasingly popular as they allow us to examine a vast number of scientific questions. Frequently, these databases are missing many potentially important confounders that are required for valid estimation of a causal effect. We propose a Bayesian data augmentation approach with external validation data as a way of controlling for unmeasured confounding in large databases with missing confounders. We implement variable selection, which accounts for the uncertainty in the choice of confounders, while simultaneously imputing missing confounder values within a coherent Bayesian framework. We illustrate our method with a simulation study and compare our results with standard multiple imputation, propensity score calibration, and conditional propensity scores. We apply our method to the analysis of surgical resection on cancer patients in Medicare when additional information is available through the SEER registry. We find that covariates available through the SEER validation data changes the effect of surgical resection on survival and that our proposed approach helps to control for the effect of the missing covariates.