Abstract:
|
Boosting offers an approach to obtaining non-parametric regression estimators that are scalable to applications with many explanatory variables. In spite of its popularity and practical success, it is well-known that boosting may provide poor estimates when data have outliers. We present a two-stage robust boosting algorithm which first minimizes a robust residual scale estimator, and then improves it by optimizing a bounded loss function. Unlike previous robust boosting proposals this approach does not require computing an ad-hoc residual scale estimator in each boosting iteration. The effectiveness of our method is illustrated using simulated and benchmark data and it compares favorably to existing methods: with clean data, our method works equally well as gradient boosting with the squared loss; with symmetric and asymmetrically contaminated data, our proposal outperforms other boosting methods (robust or otherwise) in terms of prediction error. We implemented the proposed method as an R package "RRBoost" that is available from CRAN.
|