Abstract:
|
Multiple imputation [Rubin, 1987] is difficult to conduct if the analysis model includes interactions, squares, or other transformations of variables with missing values for two reasons: First, the imputer must be aware of the analysis model to address the congeniality issue [Meng, 1994]. Second, the imputer must choose to produce either biased parameter estimates, even in the case of missing completely at random, by the passive-imputation algorithm (van Buuren and Groothuis-Oudshoorn [1999], a.k.a. impute, then transform) or inconsistent data relations by the just-another-variable algorithm (von Hippel [2009], a.k.a. transform, then impute). Although some research on imputing squares has been conducted [Vink and van Buuren, 2013], the conflict persists for all other nontrivial transformations. We propose a flexible local imputation model that builds upon the ideas of Cleveland [1979]. Implicitly, local imputation captures a broad range of transformations such as interactions, squares, cubes, roots, and logs. Hence, there is no need for the imputer to consider variable transformations. All they need to consider is the inclusion of all relevant variables as they are. In a simulation study, we compare our proposed local-imputation algorithm with, among others, random forest imputation [Doove et al., 2014], which also addresses nonlinearities.
|