Abstract:
|
In surveys, errors such as selection bias, nonresponse, or noncoverage are all potential causes of biased estimates. This paper focuses on selection bias, which could be self-inflicted due to erroneous sample selection or could occur as missing not at random (MNAR) nonresponse. As examples, tobacco use surveys may be subject to selection bias since young males who are more prone to tobacco use are also less likely to participate; and surveys of domestic violence with an unbalanced sample of older females could induce biased results since the prevalence is highly correlated with age and gender. The common approach of mitigating bias using weighting adjustments justified by models for response propensity may increase the variance of weighted estimates. This paper examines empirically the bias and variance via gradient boosting, a popular statistical learning method, which develops weighting adjustments taking into account the correlation between survey outcomes and response propensity. Simulations are used to study the impact on bias and variance in three settings: 1) missing at random; 2) MNAR with partial model specified; and 3) MNAR with selection bias and partial model specified.
|