All Times ET
Program is Subject to Change
Imputation Through Machine Learning Procedures: An Empirical Comparison (308029)Mehdi Dagdoug, Université de Bourgogne Franche-Comté
Camelia Goga, Université de Bourgogne Franche-Comté
*David Haziza, Université de Montréal
Keywords: Imputation; nonresponse; machine learning.; bias; efficiency.
Single imputation is commonly used for the treatment of item nonresponse in surveys. Imputation procedures may be classified into two broad classes: the parametric procedures and the nonparametric procedures. Parametric procedures, that include linear regression imputation as a special case, are based on parametric imputation models that may be vulnerable to model misspecification. The model may be misspecified if the link function is misspecified or if the model fails to include interactions or predictors that account for curvature (e.g., quadratic and cubic terms). In contrast, nonparametric procedures tend to be robust to model misspeficiation, which is a desirable property. In the last two decades, machine learning procedures have gained in popularity in national statistical offices. These procedures that include regression trees, random forests, generalized additive models, Bayesian adaptative regression trees, gradient boosting, nearest-neighbour imputation and Cubist, as special cases, are nonparametric in nature. We will present the results of a simulation study that compares these methods in terms of bias and efficiency in the context of imputation for missing data.