The American Housing Survey (AHS) currently uses a hot-deck method to impute missing values for approximately 120 variables. Hot deck methodology for AHS involves imputing values for non-respondents with values of respondents. The imputation process is completed within disjoint subsets of the universe, which we refer to as donor pools. We define the donor pools with auxiliary variables that are available for both the respondents and non-respondents. In our paper, we introduce new auxiliary variables and apply cluster analysis to produce improved donor pools that minimize within-pool variation across all variables that use each set of donor pools. We also generate donor pools for imputing a single variable.
We describe the clustering methods used to define the donor pools; the methods include classification and regression trees (CART), hierarchical agglomerative clustering, and k-means clustering. We compare the donor pools by measuring the within-pool variation of the imputed variables using the current donor pools and the alternative donor pools. We also will compare our results with imputed values generated with multivariate multiple imputation methods.
|