Abstract:
|
Electronic health records (EHR) come from health encounters occurring continuously in time. G-methods are often applied with EHR data to estimate causal effects and rely on temporal relationships between treatment, confounders, and outcome. Thus, analysis data must be structured in discrete time intervals (e.g. weeks). We examined if the choice of time interval can impact inference. We simulated a ‘complete’ dataset (N=50,000) under no treatment effect with 12 timepoints, a confounder L(t), a binary treatment A(t), and a time-to-event outcome Y(t). From this, we derived a ‘coarse’ dataset with 3 intervals, where times 1-4, 5-8, and 9-12 were each merged to create one interval with the mean of L(t) and maximum of A(t) and Y(t). In each dataset, we compared two interventions, always vs. never treat, by estimating the risk difference (RD) (95% CI) using the parametric g-formula. As expected, the RD in the ‘complete’ dataset was 0.000 (-0.006, 0.007). The RD in the ‘coarse’ dataset was 0.002 (-0.003, 0.007), an unbiased estimate. In future work we plan to increase the number of timepoints and vary the level of confounding.
|