Abstract:
|
Despite the many benefits, EHR data are often subject to a complex and poorly understood pattern of missing data, such that the typical missing at random assumption may be untenable. In contrast to traditional methods of sensitivity analysis and estimation of parameter bounds, we explore double sampling in which complete data is obtained on a sub-sample via intensive follow-up. We discuss assumptions and designs under which the joint density of interest is identified, and present a general approach for constructing estimators in the augmented sample. From this analysis, we show when the initial missingness process itself is identified, and how the associated missing at random assumption can be tested. Further, we apply the framework to derive semiparametric efficient and multiply robust estimators of causal average treatment effects from double-sampled observational data when outcome data are initially missing not at random. Finally, we demonstrate our statistical approach, as well as the practical feasibility of the design, in an EHR-based analysis of weight outcomes following bariatric surgery.
|