Abstract:
|
Tree-based methods are widely used for classification in health sciences research, where data are often clustered. In this paper, we extend the original classification tree paradigm (CART) (Breiman et al. 1984) to clustered binary outcome setting where covariates are observed both at the cluster- and individual- levels. Using residuals from a null generalized linear mixed model as the outcome, we build a regression tree to partition the covariate space into rectangles. This circumvents modeling the correlation structure explicitly while still accounting for the cluster-correlated design, thereby allowing us to adopt the original CART machinery in tree growing, pruning and cross-validation. Class predictions for each terminal node in the final tree are given based on the success probabilities for the specific node. Based on extensive simulations, we compare our residual based classification tree to CART. The methods are illustrated using data from a kidney cancer study and a childhood vaccination study. Finally, to gain accuracy in predictions and address instability in a single tree, we provide extension of our methodology to grow an ensemble of trees or forest.
|