Abstract:
|
When a statistical agency releases survey data to the public, the agency is responsible for disseminating high-quality data while protecting the privacy of respondents. As collected, data often contain missing, inconsistent or implausible values. Agencies prefer handling those values by imputation process and editing process followed by disclosure limitation process. To date, the three processes have been largely disconnected, and the impact of each data processing to final inference with released data is often unclear. In this study, we suggest a multiple imputation approach for simultaneously handling missing and faulty data and then generating synthetic data, leveraging a nonparametric Bayesian model. More specifically, the synthesizer generates synthetic data that preserve joint distributional features of the original data and lead to final inference that appropriately reflects the uncertainty introduced by imputation, editing and synthesizing processes. We apply the method to generate synthetic public use datasets for the 2007 U.S. Census of Manufactures.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.