Abstract:

We propose boosting methods for random forests in an exponential family framework. For a model ??(YX) = g^{1} (?(X))??, we propose estimating ?(X) via a small number of random forest boosting steps. To do so, we fit random forests to boosting pseudoresponses defined by the derivative of the loglikelihood at the current prediction. The leaves of this forest are then updated posthoc to maximize log likelihood. This allows the iteration of this process to allow a small number of boosting steps. Using a small number of boosting steps allows us to extend existing variance estimators for random forests to our boosted estimate, thereby constructing prediction intervals with good asymptotic properties. We demonstrate in both real and simulated data that even one boosting step reduces bias and improves mean squared error compared to the standard random forest algorithm.
