Abstract:
|
Many diseases can be identified in the data from diagnostic codes. However, this is unlikely to fully capture outcomes. EHR data often contain longitudinal measures from laboratory tests (labs) which can be used for the diagnosis of diseases and for disease monitoring. In practice, labs are sometime used to identify additional outcomes (beyond those identified from diagnostic codes). We propose a novel semiparametric model for the joint distribution of a continuous longitudinal outcome and the baseline covariates using an enriched Dirichlet process (EDP) prior. This joint model decomposes into a linear mixed model for the outcome given the covariates and marginals for the covariates. The nonparametric EDP prior is placed on the regression and spline coefficients, the error variance, and the parameters governing the predictor space. We predict the outcome at unobserved time points for subjects with data at other time points as well as for completely new subjects with covariates only. We use the parametric g-formula to estimate marginal causal effects on these lab data-enhanced outcomes.
|