Online Program

Return to main conference page
Thursday, May 17
Machine Learning Applications
Thu, May 17, 6:15 PM - 7:15 PM
Regency Ballroom B
 

Performance of Cross-Validation of Binary Longitudinal Finite Mixture Models: A Simulation and Application. (304731)

Presentation

*Thom J Taylor, Nicklaus Childrens Research Institute 

Keywords: Longitudinal Finite Mixture Models, k-fold cross validation, simulation, health care data

Introduction. For many health care institutions, a large feature space necessary for accurate prediction may not be available. This may leave unexplained heterogeneity in data used for prediction. Longitudinal Finite Mixture Models (LFMMs) are one way of addressing this unexplained heterogeneity in prediction of outcomes important to health care such as unplanned inpatient readmission or emergency department utilization. Recently, k-fold cross-validation (CV) has been proposed for variance reduction in the predictions made from LFMMs. However, k-fold CV of LFMMs may result in biased estimation of the true number of classes derived in LFMMs. The aim of this study was to assess potential bias in class identification in k-fold CV LFMMS. Methods. We simulated 3 and 4 class datasets and varied the patient random-effect variance {.1, .3, 1.0} for each individual class sample size in set {200, 500, 800} for binary events spread across a maximum of 5 years of longitudinal observation. The main outcome of interest was whether the number of k-folds in set {2, 5, 10} resulted in average hold-out sample prediction AICs that identified the true number of classes. Results. Only 37% of k-fold CV LFMMs recovered the true underlying class structure. Across the 54 conditions simulated, k-fold CV LFMMs consistently resulted in hold-out AIC averages indicating 1 fewer class than the true number of classes (e.g., 2 classes suggested by average AICS in a known 3 class model). Findings will be presented graphically for each of the 54 conditions. Discussion. K-fold CV LFMMs may underestimate the true unobserved heterogeneity relative to full sample estimation of LFMMs. These findings have important considerations for the use of LFMMS in health care administrative data to predict outcomes in heterogeneous data. This issue may be particularly relevant for smaller health and medical data sources lacking extensive prediction feature spaces for heterogeneous health care dynamics.