Abstract:
|
The phenomenon of survey nonresponse has become a greater issue not only in demographic and economic surveys, but in surveys of state, county, and local governments as well. Higher nonresponse rates induce higher nonresponse bias in the survey estimates. The standard method to mitigate nonresponse bias is to incorporate response propensity into sampling weight adjustments. The traditional method of estimating response propensities is to apply logistic regression techniques. In this paper, we introduce a relatively new data mining technique called a random forest model to predict the response propensities based on historical data available. The sample then oversamples units with high response propensity scores. The random forest model offers measurable improvements over the logistic regression model in our sample design. Differences in misclassification rates demonstrated the improvement in the new sample design. An application was made to non-property taxes collected in the 2002 and 2007 Censuses of State, County, and Local Governments.
|