Online Program Home
My Program

Abstract Details

Activity Number: 648 - Statistical Challenges in the Analysis of EHR Data
Type: Invited
Date/Time: Thursday, August 2, 2018 : 10:30 AM to 12:20 PM
Sponsor: Health Policy Statistics Section
Abstract #326876 Presentation
Title: Utilizing Statistical Methods for Pre-Processing EHR Data for Analysis
Author(s): Alex Milinovich*
Companies: Cleveland Clinic
Keywords: EHR; Clinical data; Similarity calculations; EMR; Medical data; Ontologies

Raw electronic health record (EHR) data is disorganized and full of uncodified variables. Working directly with EHR data for statistical analysis is a challenge in of itself. Many data points are duplicated and are reliant on upon a very small set of validation criteria shown to the data entry personnel. Intimate knowledge of the data structure of the EHR is necessary for even the simplest of queries. At Cleveland Clinic, less than 5% of the EHR data are codified variables. The rest are identifiers, dates, and free-text entries. In order to provide the cleanest and most robust datasets for statistical analysis, numerous statistical techniques are used to clean, parse, map and validate the raw EHR data. The raw data is then taken from both the EHR and other disparate data source, mapped to discrete ontologies, cleaned & standardized, and finally deposited into a clinical research data repository. Approximately 185 tables from different data sources are condensed into 18 research-ready tables. Via this process, Cleveland Clinic is able to do live population exploration & produce clean datasets rapidly.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program