Online Program Home
My Program

Abstract Details

Activity Number: 518
Type: Invited
Date/Time: Wednesday, August 3, 2016 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #319180
Title: Iterative Random Forests: Stable Identification of High-Order Interactions in Heterogeneous and High-Dimensional Data
Author(s): Sumanta Basu* and Bin Yu
Companies: University of California at Berkeley and University of California at Berkeley

Integrative analysis of large, heterogeneous datasets poses a central challenge in many areas of science. Tools exist to detect important main effects and low-order interactions between pairs or small subsets of parameters; however, the detection of nonlinear, high-order interactions from real-world sample sizes has remained fundamentally unsolved. Through extensive and realistic simulation, we developed a method for detecting interactions of high-order in low-sample regimes based on Random Forests (RF) - with an order-zero increase in computational cost over the base algorithm. We regularize RF using soft dimension reduction and adaptive iterative refitting, and then decode the fitted data representation by analyzing feature usages in decision-paths. We call our approach, ``iterative Random Forests'', or iRF, and the general class of algorithm ``Introspective Learning'' to connote the importance of self-interrogation followed by iteration. We demonstrate the usefulness of iRF in two motivating studies: modeling enhancer sequences in Drosophila Melanogaster, and identifying chromatin-RNA interactions at alternatively spliced exons in human cells. In both settings, iRF has similar or better predictive power compared to existing approaches, and provides new insights into relationships among the features. Current challenges in the biosciences motivated the development of iRF, and the algorithm is applicable to any prediction problem in which features are well defined.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association