Online Program Home
My Program

Abstract Details

Activity Number: 539 - SPEED: Bayesian Methods and Applications in the Life and Social Sciences
Type: Contributed
Date/Time: Wednesday, August 1, 2018 : 11:35 AM to 12:20 PM
Sponsor: ENAR
Abstract #332632
Title: Variable Selection and Cluster Identification Using Mixture of Regression Trees
Author(s): Emanuele Mazzola* and Mahlet Tadesse and Giovanni Parmigiani
Companies: Dana-Farber Cancer Institute and Georgetown University and Harvard T.H. Chan School of Public Health / Dana-Farber Cancer Institute
Keywords: Bayesian; CART; Mixture; Regression ; Trees

Datasets displaying a high number of covariates may conceal latent (cluster) structures and, within these homogeneous subgroups, functional relationships between subsets of predictors and the outcome of interest; these may not be easily discovered using currently available variable selection methods. We propose a novel and general method based on mixtures of regression trees, to identify relevant predictors associated to the outcomes, assuming a latent or unobserved cluster structure in the dataset. We adopt a Bayesian perspective, which allows us to simultaneously uncover homogeneous subgroups and identify covariates with nonlinear relationships with the outcome, also accounting for interaction effects. We achieve this aim using a MCMC algorithm that alternates, at each iteration, between the update of the cluster structure using a Gibbs sampler, and the update of the regression tree within each cluster using the Bayesian CART algorithm. Examples of the proposed method will be illustrated on simulated and real datasets, including comparisons with competing approaches.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program