Abstract:
|
The Canadian Census edits and imputes missing and erroneous data using a nearest neighbor donor imputation methodology. The choice of auxiliary variables and their respective weights used in the calculation of the similarity measure for the nearest neighbor algorithm can have a large impact on the quality of said imputation strategy. In the past, this choice was mainly influenced by subject matter expertise. For the 2016 Census however, in particular for some questions related to immigration, it was decided that feature selection would be employed to aid in the choice of auxiliary variables. This paper will describe and evaluate the chosen method for the 2016 Census, the Relief algorithm, as well as test and compare it with other feature selection methods. The methods are tested using Monte Carlo simulation studies with data on immigration category, taken from the 2016 Census, under various response mechanisms.
|