Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 184 - Recent Advances in Statistical Machine Learning
Type: Invited
Date/Time: Tuesday, August 10, 2021 : 1:30 PM to 3:20 PM
Sponsor: IMS
Abstract #316614
Title: Provable Boolean Interaction Recovery from Tree Structures Obtained via Random Forests
Author(s): Bin Yu* and Merle Behr and Yu Wang and Xiao Li
Companies: University of California, Berkeley and UC Berkeley and University of California, Berkeley and University of California, Berkeley
Keywords: random forests; decision trees; iterative random forests; Boolean interactions; consistent discovery; high-order
Abstract:

Random Forests (RF) based on decision trees are at the cutting edge of supervised machine learning. They are especially successful for genomics prediction problems. Stabilized RFs or iterative random forests (iRF) have shown great promise for high-order biological interaction discovery that is central to advancing functional genomics and precision medicine. Theoretical understanding into how tree-based methods utilize high-order feature interactions is missing,

In this talk, we first propose a new regression model, called Locally Spiky Sparse (LSS), which is biologically inspired without Lipschitz assumptions. The LSS model assumes that the regression function is a linear combination of a set of piece-wise constant, discontinuous Boolean interaction functions. It makes possible theoretical studies of model selection consistency of interactions, which is a useful metric for interaction discovery in practice. We show that with high probability under the LSS model as the data size increases, the tree structures obtained by RF lead to consistent discovery of the Boolean interactions. Our results are illustrated through data-inspired simulations.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program