Online Program

Return to main conference page

All Times ET

Thursday, June 3
Practice and Applications
Data-Driven Healthcare
Thu, Jun 3, 1:10 PM - 2:45 PM
TBD
 

Sequential Pattern Mining of Electronic Health Record for Early Diagnosis of Amyotrophic Lateral Sclerosis (309832)

William Jin, Community MS 
Cindy Liang, TAMS at UNT 
*Lily Sun, Stanford OHS 

Keywords: machine learning, ALS diagnosis, sequential pattern mining, big data, bioinformatics, epidemiology

Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease primarily affecting the upper and lower motor neurons. The average survival time for ALS patients is 19 months from the time of diagnosis and 30 months from symptom onset. The diagnosis of ALS is primarily based on clinical evaluation along with a series of tests to rule out other mimicking diseases. The clinical diagnosis remains challenging with an average diagnostic delay of 11 to 12 months or more after the onset of symptoms. Thus, early diagnosis of ALS is critical to prolong survival and improve quality of life. A possible tool that could be used for early detection of ALS is sequential pattern mining: a data mining technique used to identify patterns of ordered events. In this study, we use sequential pattern mining to predict ALS based on electronic health records. Our objective is to determine whether sequential pattern mining is an effective tool for early ALS diagnosis based on electronic health records.

We use cSpade, a version of SPADE (Sequential Pattern Discovery using Equivalence classes), to predict ALS diagnosis based on diagnosis codes, procedure codes, prescribed medications, and clinical evaluation from the Electronic Health Record (EHR) data set sourced from integrated delivery networks (IDNs) (80%) and group practices (20%) containing longitudinally connected, patient-level clinical data from a 10% random sample of a total of 101 million patients, with about 10 million patients in each census region, captured from 2007 to 2019. We included 22,000 ALS patients and 10 million non-ALS patients in our study and extracted the patient records. The patients were divided into a training set (90% of patients) and a test set (10% of patients) to generate rules and make predictions respectively. After mining the patients’ history of prior ALS diagnosis to observe frequent patterns, we evaluate how useful the most relevant patterns are when making a prediction of ALS.