Keywords: missingness, EMR, risk adjustment, ICU
EMR-based quality improvement studies often use cluster randomized stepped wedge designs which require excellent control for disease severity for each cluster and time segment. Severity measures are frequently aggregate scores of variables imported into the EMR from laboratory reports, monitoring equipment, or standard assessments. However, because of missingness, the ability to calculate a risk score from EMRs decreases with each variable needed. We use the Sequential Organ Failure Assessment (SOFA) score with 6 variables to illustrate the cumulative effect of EHR data missingness. A cohort of 2,261 ICU patients with respiratory failure had 42% missing baseline SOFA scores. Data were missing both at random and systematically due to: 1) the data source; 2) clinical value; or 3) care process. We discuss the rationale, methods, and results of imputing a baseline SOFA score using selective line-item deletion, imputation by regression, multiple imputation, and changes in the baseline window from 24 to 48 hours. We present the results of simulations of the effect of the SOFA imputations on differences in survival, days on ventilator, ICU days, cost of care and discharge destination.