Online Program Home
My Program

Abstract Details

Activity Number: 256 - Contributed Poster Presentations: Section on Statistical Learning and Data Science
Type: Contributed
Date/Time: Monday, July 29, 2019 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #307321
Title: Aggregated Single-Study Learners for Generalizable Predictions
Author(s): Boyu Ren* and Lorenzo Trippa and Giovanni Parmigiani
Companies: and Dana-Farber Cancer Institute and Dana-Farber Cancer Institute
Keywords: statistical replicability and reproducibility; stacking regression; hierarchical model; environmental health

Replicability of the performance of prediction models across different studies gains increasing attention in the scientific community. Recently, a method based on ensemble learning is proposed to address this problem. This method utilizes stacking in machine learning to create a consensus prediction model based on ensembles of prediction models trained on different studies. In this paper, we propose a theoretical framework to justify the advantage of this consensus model in terms of performance replicability. The framework further alludes to a more principled approach for combining ensembles of prediction models that overcomes potential over-fitting. We also explore the visualization of study heterogeneity via the consensus prediction model, which provides crucial information about the reliability of the consensus model in a given subset of a covariate space. We then apply our approach to an air pollution dataset to predict mortality rate based on air quality. We show that our predictor is robust against the geographical heterogeneity of the effects of pollutants and has better prediction accuracy than city-specific prediction models by borrowing information across cities.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program