Abstract:
|
In a wide variety of contexts, high-dimensional covariates can be placed into separate pre-defined groups. To better utilize such high-dimensional covariates in both regression and survival prediction, we propose a flexible ensemble procedure that performs approximate Bayesian model averaging of a collection of predictions from Bayesian additive regression trees (BART). Our approximate Bayesian model averaging scheme relies on a novel representation of the BART prior as an approximate Gaussian process. We show how this approximation leads to an interesting new distance between covariates, and we show how this approximation can yield easy computation of posterior model probabilities for both continuous and time-to-event outcomes. Through a series of simulation studies we demonstrate how our Bayesian model averaging method allows for more effective exploration of the covariate space, efficient computation, improved prediction, and model assessment. We explore the use of our method using a multi-site study of lung cancer survival and discuss its use in determining the relative importance of key clinical and genetic features.
|