Online Program Home
My Program

Abstract Details

Activity Number: 252 - SPEED:Improving Survey Data Quality with Multiple Data Sources, Administrative Data, and Nonresponse Bias Control, Part 2
Type: Contributed
Date/Time: Monday, July 29, 2019 : 2:00 PM to 2:45 PM
Sponsor: Survey Research Methods Section
Abstract #307637
Title: An Evaluation of Traditional and Machine Learning Imputation Methods for Sampling Frame Construction for the American Voices Project
Author(s): Cong Ye*
Keywords: imputation; machine learning; hot-deck; sampling frame; missing data; address-based sampling

Missing data imputation usually is an important step in complex survey sampling where complete data on key variables for every case on the frame are desirable. The American Voices Project uses address-based sampling based on address frame provided by a licensed vendor. However, as learned in the pilot study, the address list has missing data on key variables (e.g., income, education, race/ethnicity), which calls for missing data imputation to complete the data for sampling. During the pilot study, the values were imputed using the traditional hot-deck method which selects a donor from a similar case for the missing value. With the 2018 pilot study data, we would be able to compare traditional and machine learning imputation methods. The goal of this evaluation is to suggest the best imputation method to impute missing data on the sampling frame for the full scale study, which aims to interview 5,000 individuals across the country sampled through address-based sampling.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program