Abstract:
|
This paper considers the problem of average treatment effect estimation and inference in the context of linear model selection when a large number of covariates are present. While the estimation bias in an under-fitted model is well recognized in the literature, we address a lesser-known bias from an over-fitted model. In many settings, perfect model selection is too much to expect, even asymptotically. However, model selection can often be pursued to avoid under-fitting at the cost of over-fitting. We show that the over-fitting bias can be reduced or eliminated through data splitting, and more importantly, smoothing over random data splitting or bootstrap-induced splitting is needed to mitigate the loss of efficiency due to data splitting. Under appropriate conditions, we show that the smoothed post selection estimator of the average treatment effect is asymptotically normal, and its variance can be well estimated from the nonparametric delta method.
|