Abstract:
|
When releasing data to the public, a vital concern is a risk of exposing personal information of the individuals who have contributed to the data set. Many mechanisms have been proposed to protect individual privacy, though less attention has been dedicated to practically conducting valid inferences on the altered privacy-protected data sets. For frequency tables, the privacy-protection-oriented perturbations often lead to negative cell counts. Releasing such tables could lose users' confidence in the usefulness of such data sets. In this paper, focusing on releasing one-way frequency tables, we recommend an optimal mechanism that satisfies epsilon-differential privacy (DP) without suffering from having negative cell counts. The procedure is optimal in the sense that the expected utility is maximized under a given privacy constraint. Valid inference procedures for goodness-of-fit tests are developed for the DP privacy-protected data sets. In particular, we propose a de-biased test statistic for the optimal procedure and derive its asymptotic distributions. The decaying rate requirement for the privacy regime is provided for the inference procedure to be valid. We further co
|