Name: 2021 Joint Statistical Meetings
Start: 2021-08-08T07:00:00+00:00
End: 2021-08-12

Online Program Home
My Program

All Times EDT

Abstract Details

Activity Number:	132 - SLDS CSpeed 1
Type:	Contributed
Date/Time:	Monday, August 9, 2021 : 1:30 PM to 3:20 PM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #318506
Title:	Applications of Machine Learning Methods to Identify Pediatric Patients with De Novo Acute Myeloid Leukemia from a Real-World Data Set
Author(s):	Yimei Li*
Companies:	University of Pennsylvania
Keywords:	machine learning; real world data; pediatric; patient identification; random forest
Abstract:	Comparative effectiveness studies are important to improve outcomes for pediatric patients with acute myeloid leukemia (AML), and such studies require large cohorts using real world data such as administrative databases. However, patient identification in such databases is challenging, given the limited accuracy of the ICD codes. Previous cohort assemblies were based on manual reviews of longitudinal daily chemotherapy patterns for individual patients and the process was very labor-intensive. In this study we attempted to use machine learning methods to replace the manual review process. We considered multiple methods including linear support vector machine, random forest, naïve Bayes, and gradient boosting classifier. We developed the algorithm using 75% of the study sample with four-fold cross validation and validated it in the rest of the sample. We also performed external validation using a separate data source. Both internal and external validation suggested outstanding performance, with positive predictive value > 81%. We plan to use this approach for future cohort assembly of AML patients, and work is ongoing to extend it for another disease population.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program

JSM 2021 Online Program

Abstract Details

American Statistical Association