237 – SPEED: Missing Survey Data: Analysis, Imputation, Design, and Prevention
Does Sequence of Imputed Variables Matter in Hot Deck Imputation for Large-Scale Complex Survey Data?
Amang Sukasih
RTI International
Jean Wang
RTI International
Peter Frechtel
RTI International
Karol Krotki
RTI International, Washington, DC
Hot deck is an imputation method for complex survey data, especially popular when many survey items are to be imputed. Items are frequently correlated in surveys; one goal of imputation is to preserve these relationships. When imputing many variables and deciding which should be imputed first, one can decide on the sequence in which the variables are imputed---based on order of appearance in the questionnaire (a screener question is imputed first before its follow-up questions) or based on rate of missing data (items with lowest rate would be imputed first, followed by items with higher rates). Iteratively cycling the imputation may address association among variables (once all variables with missing values are imputed, imputation is rerun with previously imputed values in the covariates being treated as reported values). This presentation discusses results from investigating the sensitivity of final estimates to the sequence of imputed variables. We also measure the impact of factors such as missing data rates and number of levels in categorical variables and imputation cycles. We use empirical simulation and focus on bias reduction and preservation of variable relationships.