Breiman’s random forest can be interpreted as an implicit kernel generator, where the ensuing proximity matrix represents the data-driven kernel. Under mild assumptions it can be shown that this kernel asymptotically approaches a Laplacian kernel. Furthermore, it has been recently also shown that the Laplacian kernel underlies other tree-based ensembles such as Mondrian forest or BART. Kernel perspective of random forests has been used to develop a principled framework for theoretical investigation of statistical properties of random forests. However, practical utility of the links between kernels and random forests has not been widely explored.
Focus of our work is investigation of the interplay between kernel methods and random forests. We elucidate the properties of the data driven random forest kernels in a simulation study of continuous, binary and survival outcomes. We also give a real-life example to show of how these insights may be leveraged in practice. Finally, we discuss further extensions of the random forest kernels in context of Gaussian process prediction and interpretable prototypical regression.
|