Abstract:
|
Since its introduction by Brieman (2001), random forests have proved to be high preforming, general performance algorithm used with little tuning. Thought analysis like Brieman (2001) and Biau (2012) provide analysis that suggests convergence of the estimator to the mean estimator under the random creation of trees - which suggests that the number of trees shouldn’t be seen as holding potential for over fitting - experiments (Segal (2004) ) and theory (Biau (2012) and Denil et al (2014)) suggest that Random forest algorithms’ use of fully grown trees provides potential for overfitting / higher asymptotic risk. We provide a way to smooth a random forest (for both the classification and regression setting), which can be framed as reducing the depth of the trees within the random forest. We provide a set of simulation studies that shows that this smoothing can be useful in low density regions where the change in the underlying function is less smooth (using examples from Criminisi et al (2012)), new examples showcasing the extent this over fitting exists in higher dimensions, and also on commonly used statistical machine learning datasets.
|