Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 245 - SLDS CSpeed 4
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #319084
Title: A Random Forest Method with Variable Selection for Developing Prediction Models for Binary Outcomes with Clustered and Longitudinal Data
Author(s): Jaime Lynn Speiser*
Companies: Wake Forest School of Medicine
Keywords: variable selection; random forest; longitudinal data; clustered data; prediction model; feature selection

Machine learning methodologies are gaining popularity for developing prediction models for datasets with a large number of predictors, particularly in the setting of clustered and longitudinal data. Binary Mixed Model (BiMM) forest is a promising machine learning algorithm which may be applied to develop prediction models for clustered and longitudinal binary outcomes. Although these methods exist, variable selection has not been analyzed via data simulations. We conducted a simulation study to compare BiMM forest with variable selection (backward elimination or stepwise selection) to standard generalized linear mixed model variable selection methods (shrinkage and backward elimination). BiMM forest with backward elimination generally offered high computational efficiency, similar or higher predictive performance (accuracy and area under the receiver operating curve), and similar or higher ability to identify correct variables compared to linear methods for the different simulated scenarios. We applied the methods to develop prediction models for mobility disability in older adults using longitudinal data from the Health, Aging and Body Composition Study.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program