Abstract:
|
Casting tree building as a change-point detection problem, we show that it is possible to prune a regression tree efficiently using properly modified information criteria, and we discuss some applications to tree-based ensemble learning methods. We prove that one of the proposed pruning approaches using a modified Bayesian information criterion is consistent for identifying the correct tree model when it exists as a subtree within a larger tree. In practice, we obtain simplified trees that can have prediction accuracy comparable to trees obtained using standard cost-complexity pruning. We briefly discuss an extension to random forests that adaptively prunes trees to prevent excessive variance. The extension includes regular random forests as a special case, and is therefore expected to perform at least as well, with a negligible additional computational cost.
|