All Times EDT
Keywords: Random Forest, Random Forest Kernel, Kernel Methods, Big Data
Following groundbreaking work of Leo Breiman, several reports have shown that the random forest kernel represented by the proximity matrix asymptotically approaches a Laplacian kernel. It has been recently also shown that the Laplacian kernel underlies other tree-based ensembles such as Mondrian forest or the Bayesian Additive Regression Trees (BART). The kernel perspective on random forests provides for a suitable framework to theoretically elucidate statistical properties of random forests. Focus of our work is investigation of the interplay between kernel methods and random forests and its leveraging in practice. We investigate the performance of the data driven random forest kernels in simulations for various setups that include continuous and binomial endpoints. We show that prediction models developed from the random forest kernels followed by regularized linear models are competitive and often outperform the random forest used in a traditional way. We also demonstrate the utility of the random forest kernels in real-life examples. Finally, we discuss further extensions of the random forest kernels for survival, interpretable prototypical learning and ensuing visualizations that facilitate insights into data complexity.