Outliers in Non-Parametric Estimation of Treatment Effects
*Darwin Ugarte, Economic School of Louvain, Centre for Research in the Economics of Development (CRED), FUNDP. Namur 

Keywords: Treatment effects, Outliers, Propensity score, Mahalanobis distance

Methods for estimating average treatment effects for a binary treatment under the unconfoundedness assumption are a valued tool in econometric empirical program evaluation. As semiparametric techniques, these methods rely on a parametric estimation of the “metrics” - propensity score and the Mahalanobis distance - used to define and compare observations with similar covariates, while the relationship between outcome and the “metric” is nonparametric.

As is well known, in regression analysis, multivariate outliers can distort estimators and yield unreliable results. Outliers in a multivariate point cloud can be particularly hard to detect. Three types are known: [1] outliers in the y-dimension (vertical outliers), [2] outlying values in the covariates, the x-dimension, but not in the y-dimension (good leverage points), and [3] outlying values in both y-and x-dimensions (bad leverage points). In this paper we show that good and bad leverage points bias parametric estimation of the metric used to define good counterfactuals, whereas vertical outliers bias the nonparametric estimation of the treatment effect. Through Monte Carlo simulations, we show that bad leverage points bias average treatment effect estimates upward since they change the distribution of the metrics used to define counterfactuals. Good leverage points in the treated sample downward bias average treatment effect estimates, whereas good leverage points in the control sample don’t affect treatment effect estimates. Vertical outliers in the outcome greatly bias average treatment effect estimates.

The relative performance of several semiparametric estimators of average treatment effects in the presence of outliers is examined. All are biased. We provide some clues to diagnose the presence of outliers, and propose a reweighting estimator robust against outliers based on a multivariate estimator of scale and location. Our motivating application estimates the impact of an Indian health microinsurance project.