Online Program

Return to main conference page

All Times EDT

Thursday, June 4
Machine Learning
Software & Data Science Technologies
Machine Learning and Software and Data Science Technologies Posters
Thu, Jun 4, 2:00 PM - 5:00 PM
TBD
 

Investigation of the Interplay Between Random Forest and Kernel Methods in Big Data (308425)

*Richard Baumgartner, Merck&Co., Inc. 
Dai Feng, Data and Statistical Sciences, AbbVie Inc. 

Keywords: Random Forest, Random Forest Kernel, Kernel Methods, Big Data

Following groundbreaking work of Leo Breiman, several reports have shown that the random forest kernel represented by the proximity matrix asymptotically approaches a Laplacian kernel. It has been recently also shown that the Laplacian kernel underlies other tree-based ensembles such as Mondrian forest or the Bayesian Additive Regression Trees (BART). The kernel perspective on random forests provides for a suitable framework to theoretically elucidate statistical properties of random forests. Focus of our work is investigation of the interplay between kernel methods and random forests and its leveraging in practice. We investigate the performance of the data driven random forest kernels in simulations for various setups that include continuous and binomial endpoints. We show that prediction models developed from the random forest kernels followed by regularized linear models are competitive and often outperform the random forest used in a traditional way. We also demonstrate the utility of the random forest kernels in real-life examples. Finally, we discuss further extensions of the random forest kernels for survival, interpretable prototypical learning and ensuing visualizations that facilitate insights into data complexity.