Abstract:
|
The current state of the art methodology for drawing valid inference from synthetic data is based on concepts of multiple imputation for missing data, and therefore requires multiple synthetic datasets to be released. However, there are some partially synthetic data products, such as the Synthetic Longitudinal Business Database, where only a single synthetic dataset is released. Motivated by examples of singly imputed partially synthetic data, Klein and Sinha (2014) developed new model based methodology for drawing valid inference in such cases. Therefore, under model-based assumptions, singly imputed partially synthetic data can yield valid inference when analyzed correctly, and a comparison with multiply imputed partially synthetic data is in order. In this paper we compare singly and multiply imputed partially synthetic data generated via plug-in sampling (Reiter and Kinney 2012) based on efficiency of inference and level of privacy protection. We find that multiply imputed partially synthetic data can yield more efficient inference than singly imputed partially synthetic data, while singly imputed partially synthetic data can yield an enhanced level of privacy protection.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.