Abstract:
|
Random Forests (RF) based on decision trees are at the cutting edge of supervised machine learning. They are especially successful for genomics prediction problems. Stabilized RFs or iterative random forests (iRF) have shown great promise for high-order biological interaction discovery that is central to advancing functional genomics and precision medicine. Theoretical understanding into how tree-based methods utilize high-order feature interactions is missing,
In this talk, we first propose a new regression model, called Locally Spiky Sparse (LSS), which is biologically inspired without Lipschitz assumptions. The LSS model assumes that the regression function is a linear combination of a set of piece-wise constant, discontinuous Boolean interaction functions. It makes possible theoretical studies of model selection consistency of interactions, which is a useful metric for interaction discovery in practice. We show that with high probability under the LSS model as the data size increases, the tree structures obtained by RF lead to consistent discovery of the Boolean interactions. Our results are illustrated through data-inspired simulations.
|