Abstract:
|
When sharing data among collaborators or releasing data publicly, one of the crucial concerns is the extreme risk of exposing personal information of individuals who contribute to the data. Many statistical methods of data privacy and confidentiality have little to no means in measuring an altered dataset's privacy guarantee. Differential privacy, a condition on data releasing algorithms, quantifies disclosure risk, but is traditionally used in a query based privacy method instead of in a synthetic dataset release. We incorporate differential privacy in our method to create multiple, synthetic data from predicted values in a Bayesian framework, with a guarantee on privacy at a prespecified level. Since differential privacy quantifies disclosure risk, we can adjust the amount of privacy in the synthetic datasets. We apply two differentially private algorithms, Laplace and Exponential, in our method with varying levels of privacy guarantee. We use simulation studies and a case study to examine the statistical inferences based on differentially privatized synthetic data and compare them with inferences based on the raw data as well as the original multiple imputation.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.