Abstract:
|
Superlearning is an ensemble machine learning method that uses cross-validation to select the optimal algorithm among all weighted combinations of a set of algorithms. We applied superlearning to the prediction of 30 day neonatal postoperative mortality. Patients in our study sample underwent a variety of procedures and data was available on a large number of preoperative characteristics, thus a flexible prediction algorithm was appealing. We used a set of classification trees, stepwise logistic regression models, penalized logistic regression models, generalized boosted models, and random forests. Patients treated in 2012-13 were used in the development of the superlearner (N=6499, 3.6% mortality), and those treated in 2014 formed the external validation sample (N=3552, 3.8% mortality). In an analysis using all available predictors, the superlearner improved upon all individual algorithms with regard to cross-validated mean squared error. It showed excellent discrimination, with an area under the receiver operating characteristic curve of 0.91 in the development sample and 0.87 in the validation sample. The superlearner showed good calibration properties in the development dataset but not in the validation dataset. Performance was similar after variable screening and when just preterm neonates were considered.
|