Abstract:
|
Ensemble learning is a mainstay in modern data science practice. Conventional ensemble algorithms assign to base models a set of deterministic, constant model weights that (1) do not fully account for individual models’ varying accuracy across data subgroups, nor (2) provide uncertainty estimates for the ensemble prediction, which could result in mis-calibrated (i.e. precise but biased) predictions that could in turn negatively impact the algorithm performance in real-word applications. In this work, we present an adaptive, probabilistic approach to ensemble learning using a transformed Gaussian process as a prior for the ensemble weights. Given input feature, our method optimally combines base models based on their predictive accuracy in the feature space, and provides interpretable uncertainty estimates both in model selection and in ensemble prediction. We illustrate the utility of our method on the real-world application of spatiotemporal integration of particle pollution prediction models in greater Boston region.
|