Abstract:
|
Surveys often collect detailed breakdowns on totals, such as total product sales by product type. These multinomial data are often sparsely reported with wide variability in proportions. Additionally, true zeros exist and differ across units even within industry; for example, one establishment sells jeans but not shoes, and another sells shoes but not socks. Large fractions of missing data are common for these breakdowns even when totals are fully observed. Hot deck imputation, filling in missing data with observed data values, is an attractive option. The whole proportion set can be imputed to preserve multinomial distributions and zero values. However, it is unclear what variant of hot deck is best. We describe a large set of ‘flavors’ of hot deck and compare them via simulation and application to data from the Economic Census. Methods of finding a donor include choosing one nearest neighbor, choosing from five neighbors, or using all units. Different ways to impute from the donor are also considered: directly imputing a donor’s proportion or imputing a draw from its distribution. We consider scenarios both with and without a strong predictor of the multinomial distributions.
|