Activity Number:
|
165
- SLDS CSpeed 2
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, August 10, 2021 : 10:00 AM to 11:50 AM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #318620
|
|
Title:
|
Ensemble Learning for Ensuring Cross-Study Replicability of Boosting
|
Author(s):
|
Cathy Wang* and Pragya Sur and Prasad Patil and Giovanni Parmigiani
|
Companies:
|
Harvard T.H. Chan School of Public Health and Harvard University and Boston University School of Public Health and Harvard University
|
Keywords:
|
ensemble learning;
boosting;
multi-study ;
machine learning
|
Abstract:
|
Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. Recent work in multi-study learning investigated two approaches for training replicable prediction models: (1) merging all the datasets and training a single model, and (2) cross-study learning, which involves training a separate model on each data set and ensembling the resulting predictions. We study boosting in a multi-study setting and compare merging with cross-study learning in the presence of potential heterogeneity in predictor-outcome relationships across datasets. We provide theoretical guidelines for determining whether it is more beneficial to merge or to ensemble when ridge boosting is used as the prediction model. We analytically characterize and confirm via simulations a transition point between merging and ensembling for ridge boosting. We illustrate our findings in a breast cancer data application.
|
Authors who are presenting talks have a * after their name.