Abstract:
|
Electronic health records are widely used to conduct nonrandomized studies on drug treatment effects. Some researchers use propensity score (PS) methods to control for bias and confounding in these observational studies. Since it is often pragmatically infeasible to include all available covariates in the PS model, a critical issue is how to pick a relevant subset of covariates. Since an optimal PS model includes all confounders as well as covariates related only to the outcome, researchers have proposed outcome-data driven approaches to aid PS variable selection. However, use of outcome data during PS design contradicts a core principle in statistical causal inference which dictates that one should separate the design phase (analogous to a pre-experiment stage, at which point no outcome data is yet observed) from the analysis phase of the inference procedure. We show that under certain non-null treatment effect scenarios, exploiting observed outcomes in PS construction indeed leads to biased treatment effect estimates and underpowered significance tests. We provide some alternative approaches to handle variable selection for high-dimensional propensity score models.
|