Abstract:
|
Since their inception in the 1980's, regression trees have become one of the more widely used non-parametric prediction methods. Tree-structured regression yields a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet susceptible to overfitting. Strategies against overfitting have traditionally relied on pruning greedily grown trees. The Bayesian framework offers an alternative remedy, through priors that charge smaller trees. While the consistency of trees, and their ensembles, has been studied quite extensively, the theoretical understanding of their Bayesian counterparts has largely been missing. In this paper, we take a step towards understanding why/when do Bayesian trees and their ensembles not overfit. To address this question, we study the speed at which the posterior concentrates around the true smooth regression function. We establish new theoretical results showing that (ensemble) regression trees (a) achieve optimal rates up to a log factor, (b) adapt to the unknown level of smoothness and (c) can perform effective dimension reduction.
|