Online Program

Return to main conference page

All Times ET

Program is Subject to Change

Thursday, June 17
Thu, Jun 17, 10:30 AM - 12:00 PM
Machine Learning in the Production of Official Economic Statistics

Imputation Through Machine Learning Procedures: An Empirical Comparison (308029)

Mehdi Dagdoug, Université de Bourgogne Franche-Comté 
Camelia Goga, Université de Bourgogne Franche-Comté 
*David Haziza, Université de Montréal 

Keywords: Imputation; nonresponse; machine learning.; bias; efficiency.

Single imputation is commonly used for the treatment of item nonresponse in surveys. Imputation procedures may be classified into two broad classes: the parametric procedures and the nonparametric procedures. Parametric procedures, that include linear regression imputation as a special case, are based on parametric imputation models that may be vulnerable to model misspecification. The model may be misspecified if the link function is misspecified or if the model fails to include interactions or predictors that account for curvature (e.g., quadratic and cubic terms). In contrast, nonparametric procedures tend to be robust to model misspeficiation, which is a desirable property. In the last two decades, machine learning procedures have gained in popularity in national statistical offices. These procedures that include regression trees, random forests, generalized additive models, Bayesian adaptative regression trees, gradient boosting, nearest-neighbour imputation and Cubist, as special cases, are nonparametric in nature. We will present the results of a simulation study that compares these methods in terms of bias and efficiency in the context of imputation for missing data.