Online Program Home
My Program

Abstract Details

Activity Number: 453 - Novel Theory and Methods in Big Data Analytics
Type: Invited
Date/Time: Wednesday, August 1, 2018 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #326640
Title: Iterative Random Forests (IRF) to Discover Predictive and Stable High-Order Interactions
Author(s): Bin Yu* and Sumanta Basu and Karl Kumbier and Ben Brown
Companies: UC Berkeley and Cornell University and UC Berkeley and LBNL and University of Birmingham
Keywords: random forests; random intersection trees; decision trees; ensemble; stability; feature-weighted

Understanding how high-order interactions among features in supervised learning presents a substantial statistical challenge. Building on RFs, Random Intersection Trees (RIT), and extensive and biologically inspired simulations, we developed iterative Random Forests (iRFs). iRFs train a feature-weighted ensemble of decision trees to detect stable, high-order interactions with a similar computational cost as RF. iRF is demonstrated for high-order interaction discovery in two prediction problems: enhancer activity in the early Drosophila embryo and alternative splicing of primary transcripts in human derived cell lines. In Drosophila, among the 20 pairwise transcription factor interactions iRF identify as stable (returned in more than half of bootstrap replicates), 80% have been previously reported as physical interactions. Novel third-order interactions suggest high-order relationships that are candidates for follow-up experiments. In human-derived cells, iRF re-discovered a central role of H3K36me3 in chromatin- mediated splicing regulation, and identified novel 5th and 6th order interactions, indicative of multi-valent nucleosomes with specific roles in splicing regulation.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program