Online Program

Saturday, February 22
PS3 Poster Session III & Continental Breakfast Sat, Feb 22, 7:30 AM - 9:00 AM
Bayshore II-IV

Estimation of Error in Electronically Available Variables in a Large Hospital Database by Simulation and Comparison with Manual Data Abstraction (302829)

Linda M Baldini, Division of Infection Control/ Hospital Epidemiology, Department of Health Care Quality, Beth Israel 
*Baevin S. Carbery, Division of Infectious Diseases, Department of Medicine, Beth Israel Deaconess Medical Center 
Long Ngo, Division of General Medicine, Department of Medicine, Beth Israel Deaconess Medical Center 
Jocelyn A Pedrick, Division of Infectious Diseases, Department of Medicine, Beth Israel Deaconess Medical Center  
Sharon B Wright, 1Division of Infectious Diseases, Department of Medicine, Beth Israel Deaconess Medical Center 
David S Yassa, Division of Infectious Diseases, Department of Medicine, Beth Israel Deaconess Medical Center 

Keywords: Health care, Error rates, Robustness, Bias, Efficiency, Modeling, Big Data

The increasing availability of large hospital electronic databases provides opportunities to perform complex modeling and obtain more robust estimates of parameters of interest. However, errors (data entry, transcription, misclassification) in these big databases may produce bias and inefficiency in the models’ estimated coefficients. In a study of post-partum infection risk factors that included 39,640 subjects and 31 electronically available variables (EAVs), we estimated the 95% confidence interval upper bound of the error rate, versus manual medical record review of a subset of the data, to be:<1% for 15 EAVs, 1-5% for 6 EAVs, 5-10% for 3, and >10% for 7. Transcription and misclassification errors, both differential and random, made the largest contribution to the magnitude of the error rate. We plan to report the effects of the error rate and its distribution using simulation on the estimated parameters (e.g., a simulated error rate of 10% for the EAV ‘circumcision’ significantly changed the parameter in all models). This simulation analysis allowed thoughtful evaluation of each EAV’s error rate, which is critical in determining the utility of large data sets.