Abstract:
|
Missing data imputation usually is an important step in complex survey sampling where complete data on key variables for every case on the frame are desirable. The American Voices Project uses address-based sampling based on address frame provided by a licensed vendor. However, as learned in the pilot study, the address list has missing data on key variables (e.g., income, education, race/ethnicity), which calls for missing data imputation to complete the data for sampling. During the pilot study, the values were imputed using the traditional hot-deck method which selects a donor from a similar case for the missing value. With the 2018 pilot study data, we would be able to compare traditional and machine learning imputation methods. The goal of this evaluation is to suggest the best imputation method to impute missing data on the sampling frame for the full scale study, which aims to interview 5,000 individuals across the country sampled through address-based sampling.
|