Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 165 - SLDS CSpeed 2
Type: Contributed
Date/Time: Tuesday, August 10, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #318620
Title: Ensemble Learning for Ensuring Cross-Study Replicability of Boosting
Author(s): Cathy Wang* and Pragya Sur and Prasad Patil and Giovanni Parmigiani
Companies: Harvard T.H. Chan School of Public Health and Harvard University and Boston University School of Public Health and Harvard University
Keywords: ensemble learning; boosting; multi-study ; machine learning
Abstract:

Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. Recent work in multi-study learning investigated two approaches for training replicable prediction models: (1) merging all the datasets and training a single model, and (2) cross-study learning, which involves training a separate model on each data set and ensembling the resulting predictions. We study boosting in a multi-study setting and compare merging with cross-study learning in the presence of potential heterogeneity in predictor-outcome relationships across datasets. We provide theoretical guidelines for determining whether it is more beneficial to merge or to ensemble when ridge boosting is used as the prediction model. We analytically characterize and confirm via simulations a transition point between merging and ensembling for ridge boosting. We illustrate our findings in a breast cancer data application.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program