Keywords: random forest, mixed model, binary outcome
Clustered binary outcomes and datasets with many predictor variables are frequently encountered in clinical research (e.g. longitudinal studies). Generalized linear mixed models (GLMMs) typically employed for clustered endpoints have challenges for some scenarios, particularly for high dimensional datasets. We propose a new method called Binary Mixed Model (BiMM) forest, which combines random forest and GLMM methodology. BiMM forest offers a flexible and stable method which naturally models interactions among predictors and nonlinear relationships between predictors and outcome, and can be efficiently employed in the setting of high dimensional data. Simulation studies show that BiMM forest achieves similar or superior accuracy compared to standard methods for clustered binary outcomes. The method is applied to a real dataset from the Acute Liver Failure Study Group. BiMM forest offers an alternative method for modeling binary outcomes which may be applied in myriad research settings for clustered datasets.