Abstract:
|
Comparative effectiveness studies are important to improve outcomes for pediatric patients with acute myeloid leukemia (AML), and such studies require large cohorts using real world data such as administrative databases. However, patient identification in such databases is challenging, given the limited accuracy of the ICD codes. Previous cohort assemblies were based on manual reviews of longitudinal daily chemotherapy patterns for individual patients and the process was very labor-intensive. In this study we attempted to use machine learning methods to replace the manual review process. We considered multiple methods including linear support vector machine, random forest, naïve Bayes, and gradient boosting classifier. We developed the algorithm using 75% of the study sample with four-fold cross validation and validated it in the rest of the sample. We also performed external validation using a separate data source. Both internal and external validation suggested outstanding performance, with positive predictive value > 81%. We plan to use this approach for future cohort assembly of AML patients, and work is ongoing to extend it for another disease population.
|