In linear models, a local disturbance in the data (outliers) can have substantial effects on parameter estimates and hence have global effects on fits and predictions since the parameters affect the fit everywhere.
In modern statistical modelling and machine learning, complex models with many parameters and associated highly computation algorithms for fitting or ``learning'' the models are used. It is not common to see discussions of outliers and influence in the context of such models. The emphasis is on out-of-sample prediction and the automatic nature of the prediction mechanism. Nevertheless, the need for diagnostics still seems compelling and ignoring the potential for influential observations would only be justified if the models being fit were intrinsically more robust to outliers.
In this work we study the effects of locally discordant observations on the global fit for Bayesian ensemble models. In particular, we look at BART (Bayesian Additive Regression Trees) and the recently developed heteroskedastic versions of BART that search for a variance function as well as a mean function.
|