Online Program Home
My Program

Abstract Details

Activity Number: 375 - Modern Statistical Methods for Comparative Effectiveness Research
Type: Invited
Date/Time: Tuesday, July 30, 2019 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistics in Epidemiology
Abstract #300300
Title: Errors in Electronic Health Records: What Two Phase Sampling Teaches Us About Data Validation
Author(s): Bryan E Shepherd* and Gustavo Amorim and Ran Tao and Sarah Lotspeich and Pamela Shaw
Companies: Vanderbilt University School of Medicine and Vanderbilt University and Vanderbilt University Medical Center and Vanderbilt University and University of Pennsylvania
Keywords: measurement error; two phase sampling; electronic health records
Abstract:

Electronic health records (EHRs) and other routinely collected data are increasingly used for medical research. These data are prone to errors, often across multiple variables, and findings based on these data can be misleading. Data validation is sometimes performed in subsamples of records. Validated subsets are used to describe the sensitivity/specificity of phenotype-defining algorithms and to justify their use in the larger cohort. However, the information in the validated subset with respect to the error rates is rarely combined with the unvalidated data to account for uncertainty in variables and to improve precision. In addition, the choice of records to validate is often not carefully considered. This is a two-phase sampling problem: phase 1 is the error-prone EHR data and phase 2 is the validated subsample. We demonstrate ways to incorporate the validation data into the larger dataset to improve estimation. Through simulations guided by the two-phase sampling literature, we consider different approaches for selecting validation subsamples. We then demonstrate the efficiency of different sampling schemes using a fully validated EHR dataset of HIV-positive individuals.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program