Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 320 - Electronic Health Records, Causal Inference and Miscellaneous
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 3:30 PM to 5:20 PM
Sponsor: Section on Statistics in Epidemiology
Abstract #318690
Title: Identifying COVID-19 Diagnoses Using Unstructured Electronic Health Records
Author(s): Benjamin Ackerman* and James Roose and Shrujal Baxi and Patrick Gonzales and Sandra D. Griffith
Companies: Flatiron Health and Flatiron Health and Flatiron Health and Flatiron Health and Flatiron Health
Keywords: EHR; COVID-19; NLP; oncology; real-world data; data quality
Abstract:

Real world data sources, like electronic health records (EHRs), may produce meaningful insights into the impact of COVID-19 infection on patients (pts) with cancer. Newly developed ICD codes are useful for identifying COVID-19 diagnoses in EHRs; however, there is concern over lagged clinical uptake and uncaptured testing outside of the EHR system. These may lead to underestimation of COVID-19 diagnoses in EHRs, thereby mischaracterizing the burden of COVID-19 infection on pts with cancer. Using the nationwide Flatiron Health EHR-derived de-identified database, we constructed and refined a natural language processing (NLP) algorithm to detect ~2400 pts with terms related to COVID-19 present in unstructured clinical notes from Feb 1 to Aug 30, 2020. We manually reviewed charts for 350 randomly selected pts, and confirmed 88 pts with documented COVID-19 diagnoses (PPV = 25%, 95% CI = 21-30%). The resulting estimated cohort of 600 pts was nearly five times larger than that estimated using ICD codes alone. Our work highlights challenges in detecting COVID-19 diagnoses in oncology EHRs with ICD codes, and an opportunity to leverage unstructured data to improve cohort selection.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program