WSDS 2021 Online Program

All Times EDT

Thursday, October 7

Thu, Oct 7, 2:45 PM - 4:00 PM
Virtual

Speed Session

Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models (309980)

Jinbo Chen, University of Pennsylvania Perelman School of Medicine
*Ellie Liu, Interlake High School

Keywords: Electronic health records; Machine Learning Models; Health disparity; Heterogeneous Population; Recalibration; Receiver Operating Characteristic Curves

Risk prediction models are often trained using electronic health record (EHR) data. Little is known about how varying complexities of EHR data affect the validity and accuracy of statistical and machine learning models. We developed machine learning models from heterogeneous simulated samples of varying disease prevalence and predictor distributions. It was observed that a global model developed from a composite sample yielded reduced area under the ROC curve (AUC), causing varying poor risk calibration and compromised predictive accuracy, particularly in populations with low representation in the training data. We proposed recalibrating risk for individual cohorts based on the global model and known summary information on each cohort, enabling improved risk-based decision-making across heterogeneous populations. We applied this method to develop a preliminary model that predicts the mortality risk for patients who experienced Sepsis in the intensive care unit using real EHR data, accounting for risk heterogeneity in ethnicity and age with a large number of EHR predictors. The resulting good calibration is important to alleviate concerns around model-induced healthcare disparities.

Online Program

Accounting for Patient Heterogeneity in Machine Learning Risk Prediction Models (309980)

American Statistical Association