Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 215 - Contributed Poster Presentations: Section on Statistical Learning and Data Science
Type: Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #312833
Title: Incorporating Group Structure into Tree-Based Algorithms and Group Selection Through Importance Measures
Author(s): Jiabei Yang* and Emily Dodwell and Ritwik Mitra and DeDe Paul
Companies: Brown University School of Public Health and Data Science & AI Research, AT&T Labs and Data Science & AI Research, AT&T Labs and Data Science & AI Research, AT&T Labs
Keywords: Tree-based algorithm; Group feature selection; Importance measure; Machine learning

In many supervised learning problems, sets of input variables have a group structure signifying underlying associations. In such cases, modeling strategies that are cognizant of these groupings (e.g. group lasso) make more sense than studying the variables individually. However, to our knowledge, few tree-based algorithms are available that consider the group structure in the splitting criteria in a computationally efficient manner, especially for high-dimensional data. Here, we propose to summarize variables within groups through group-wise principal component analysis and use the resulting principal components for fitting the tree-based algorithms. New group variable importance measures and group variable selection methods are then proposed for decision trees as well as random forest. Simulation studies are presented to show comparative benefits of our method. The proposed algorithm will be applied to gene expression data sets for tumor classification, where the genes are grouped through independent component analysis following a previous analysis.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program