Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 172 - Prediction and Misclassification in Biomedical Research
Type: Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistics in Epidemiology
Abstract #309577
Title: A Two-Stage Sampling Approach to Improve Suboptimality in EHR-Based Study Designs
Author(s): Arielle Marks-Anglin* and Yong Chen and Chongliang Luo
Companies: and University of Pennsylvania and University of Pennsylvania
Keywords: Electronic Health Records; Two-stage Designs; optimal subsampling; Misclassification; Rare Event; Observational studies

Electronic Health Record (EHR) databases are an increasingly valuable resource for observational studies. However, misclassification of EHR-derived outcomes due to imperfect phenotyping leads to bias in association studies, as well as inflated type 1 error and reduced power. On the other hand, manual chart-review to validate outcomes is both cost-prohibitive and time-consuming, and a randomly selected validation sample may not yield sufficient cases when the disease is rare. Sampling procedures have been developed for maximizing efficiency in settings where the true disease status is known. However, less work has been done in measurement constrained settings, particularly for severely imbalanced data. Motivated by this gap, we propose a two-stage sampling algorithm to optimally guide cost-effective chart review in measurement constrained settings. We validate our method through simulation study and show that it is robust to differential misclassification, imbalanced data, and various covariate distributions. We then apply our sampling method to a real world dataset with biomedical applications.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program