Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 286 - Missing Data Methods
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 1:30 PM to 3:20 PM
Sponsor: Biometrics Section
Abstract #318414
Title: Mining for Equitable, Intelligent Health: Assessing the Impact of Missing Data in Raw Electronic Health Records
Author(s): Emily Getzen* and Qi Long
Companies: Department of Biostatistics at UPenn and Department of Biostatistics at UPenn
Keywords: Electronic Health Records; Missing Data
Abstract:

Electronic health records (EHRs) offer great promises for advancing precision health. However, EHRs in their raw form can present significant analytical challenges– they contain multi-scale data from heterogeneous domains, can be structured or unstructured, and are collected at irregular time intervals and frequencies. Despite these challenges, to use raw EHRs for analyses would save significant time spent on pre-processing, thus encouraging real-world adoption in a clinical setting. EHRs also reflect inequity– some patients have differing amounts of data due to health-seeking behaviors, access to care, etc. This can contribute to biased data collection, and the consequence is that data for marginalized groups may be less informative due to fragmented care. This can be viewed as a missing data problem. There is a growing recognition that ubiquitous missing data in EHRs, even when analyzed using powerful statistical and machine learning algorithms, can yield biased findings and exacerbate health disparities. In this work we develop novel methods to simulate missing data in raw EHRs, and assess the impact via disease prediction models that incorporate word embedding algorithms.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program