Abstract:
|
Target reliability often degrades in post data collection processing, for example by adding noise to the data to protect confidentiality. One way to systematically introduce noise is to generate differentially private synthetic data, used in place of the collected data to produce summary statistics. In this paper, we explore how to modify the sampling parameters in an informative design to ensure target reliability in the protected statistics. Assuming that the analysts ultimately only get access to differentially private synthetic data, we propose an iterative simulation procedure that selects repeated samples from a known finite population (frame), produces multiple differentially private synthetic copies with a fixed value of epsilon for each selected sample, and assesses the effect of privacy protection on target reliability over repeated samples as measured by the coefficient of variation. Using simulated and empirical data, we demonstrate how this approach can aid in the sample design process, evaluating altenative designs and parameters to achieve the best reliability for a given value of epsilon.
|