Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 90 - Dealing with Error-Prone Electronic Health Record Data via Validation Sampling
Type: Invited
Date/Time: Monday, August 8, 2022 : 8:30 AM to 10:20 AM
Sponsor: Biometrics Section
Abstract #320610
Title: SAT: A Surrogate-Assisted Two-Wave Case Boosting Sampling Method, with Application to EHR-Based Association Studies
Author(s): Yong Chen* and Rebecca Hubbard and Xiaokang Liu and Jessica Chubak
Companies: University of Pennsylvania and University of Pennsylvania and University of Pennsylvania and Kaiser Permanente Washington Health Research Institute
Keywords: Association study; Electronic health records; Error in phenotype; Rare disease; Sampling strategy
Abstract:

Electronic health records (EHR) enable investigation of the association between phenotypes and risk factors. However, studies solely relying on potentially error-prone EHR-derived phenotypes (i.e., surrogates) are subject to bias. Analyses of low prevalence phenotypes may also suffer from poor efficiency. Existing methods typically focus on one of these issues but seldom address both. This study aims to simultaneously address both issues by developing new sampling methods to select an optimal subsample to collect gold standard phenotypes for improving the accuracy of association estimation. We develop a surrogate assisted two-wave (SAT) sampling method, where a surrogate-guided sampling procedure (SGS) and a modified optimal subsampling procedure motivated from A-optimality criterion (OSMAC) are employed sequentially, to select a subsample for outcome validation through manual chart review subject to budget constraints. A model is then fitted based on the subsample with the true phenotypes. Simulation studies and an application to an EHR dataset of breast cancer survivors are conducted to demonstrate the effectiveness of SAT.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program