Name: 2021 Joint Statistical Meetings
Start: 2021-08-08T07:00:00+00:00
End: 2021-08-12

Online Program Home
My Program

All Times EDT

Activity Number:	440 - SLDS CSpeed 8
Type:	Contributed
Date/Time:	Thursday, August 12, 2021 : 4:00 PM to 5:50 PM
Sponsor:	Section on Statistical Learning and Data Science
Abstract #318067
Title:	A Modified Bayesian Information Criterion for Improving the Performance of Tree-Based Learning Algorithms Without the Use of Cross-Validation
Author(s):	Nikola Surjanovic* and Andrew Henrey and Thomas Loughin
Companies:	Simon Fraser University and Finning and Simon Fraser University
Keywords:	regression trees; pruning; information criteria; cross-validation; machine learning; random forest
Abstract:	Casting tree building as a change-point detection problem, we show that it is possible to prune a regression tree efficiently using properly modified information criteria, and we discuss some applications to tree-based ensemble learning methods. We prove that one of the proposed pruning approaches using a modified Bayesian information criterion is consistent for identifying the correct tree model when it exists as a subtree within a larger tree. In practice, we obtain simplified trees that can have prediction accuracy comparable to trees obtained using standard cost-complexity pruning. We briefly discuss an extension to random forests that adaptively prunes trees to prevent excessive variance. The extension includes regular random forests as a special case, and is therefore expected to perform at least as well, with a negligible additional computational cost.

Authors who are presenting talks have a * after their name.

JSM 2021 Online Program