Name: 2019 Joint Statistical Meetings
Start: 2019-07-27T07:00:00+00:00
End: 2019-08-01
Location: Colorado Convention Center

Activity Number:	119 - Statistical Data Editing Modernisation
Type:	Topic Contributed
Date/Time:	Monday, July 29, 2019 : 8:30 AM to 10:20 AM
Sponsor:	Government Statistics Section
Abstract #306628	Presentation
Title:	Improving Edit and Imputation Strategies Through Feature Selection
Author(s):	Andrew Stelmack*
Companies:	Statistics Canada
Keywords:	Imputation; Nearest Neighbor; Feature Selection; CANCEIS; Information Theory
Abstract:	The Canadian Census edits and imputes missing and erroneous data using a nearest neighbor donor imputation methodology. The choice of auxiliary variables and their respective weights used in the calculation of the similarity measure for the nearest neighbor algorithm can have a large impact on the quality of said imputation strategy. In the past, this choice was mainly influenced by subject matter expertise. For the 2016 Census however, in particular for some questions related to immigration, it was decided that feature selection would be employed to aid in the choice of auxiliary variables. This paper will describe and evaluate the chosen method for the 2016 Census, the Relief algorithm, as well as test and compare it with other feature selection methods. The methods are tested using Monte Carlo simulation studies with data on immigration category, taken from the 2016 Census, under various response mechanisms.

Authors who are presenting talks have a * after their name.

JSM 2019 Online Program