Abstract:
|
Datasets displaying a high number of covariates may conceal latent (cluster) structures and, within these homogeneous subgroups, functional relationships between subsets of predictors and the outcome of interest; these may not be easily discovered using currently available variable selection methods. We propose a novel and general method based on mixtures of regression trees, to identify relevant predictors associated to the outcomes, assuming a latent or unobserved cluster structure in the dataset. We adopt a Bayesian perspective, which allows us to simultaneously uncover homogeneous subgroups and identify covariates with nonlinear relationships with the outcome, also accounting for interaction effects. We achieve this aim using a MCMC algorithm that alternates, at each iteration, between the update of the cluster structure using a Gibbs sampler, and the update of the regression tree within each cluster using the Bayesian CART algorithm. Examples of the proposed method will be illustrated on simulated and real datasets, including comparisons with competing approaches.
|