Abstract:
|
Superlearning is an ensemble machine learning method for selecting via cross-validation the optimal algorithm among all weighted combinations of a set of algorithms. We applied superlearning to the prediction of 30 day neonatal postoperative mortality. Patients in our study sample underwent a variety of procedures and data was available on a large number of preoperative characteristics, thus a flexible prediction algorithm was appealing. We used a set of stepwise logistic regression models, penalized logistic regression models, generalized boosted models, and random forests. Patients treated in 2012-13 were used in the development of the superlearner (N=6499, 3.6% mortality), and those treated in 2014 formed the external validation sample (N=3552, 3.8% mortality). In an analysis using all available predictors, the superlearner improved upon all individual algorithms with regard to cross-validated mean squared error. It showed excellent discrimination, with an area under the receiver operating characteristic curve of greater than 0.99 in the development and 0.86 in the validation sample. It also showed good calibration properties. Performance was similar after variable screening.
|