Activity Number:
|
465
- Privacy, Confidentiality, and Disclosure Limitation
|
Type:
|
Contributed
|
Date/Time:
|
Thursday, August 6, 2020 : 10:00 AM to 2:00 PM
|
Sponsor:
|
Government Statistics Section
|
Abstract #312865
|
|
Title:
|
Generating Poisson-Distributed Differentially Private Synthetic Data
|
Author(s):
|
Harrison Quick*
|
Companies:
|
Drexel University
|
Keywords:
|
Bayesian methods;
Confidentiality;
Data suppression;
Disclosure risk;
Spatial data;
Uncertainty
|
Abstract:
|
The dissemination of synthetic data can be an effective means of making information from sensitive data publicly available while reducing the risk of disclosure associated with releasing the sensitive data directly. While mechanisms exist for synthesizing data that satisfy formal privacy guarantees, the utility of the synthetic data is often an afterthought. More recently, the use of methods from the disease mapping literature has been proposed to generate spatially-referenced synthetic data with high utility, albeit without formal privacy guarantees. The objective for this paper is to help bridge the gap between the disease mapping and the formal privacy literatures. In particular, we generalize an existing approach for generating formally private synthetic data to the case of Poisson-distributed count data in a way that accommodates heterogeneity in population sizes and allows for the infusion of prior information. We demonstrate via simulation study that the proposed approach for generating differentially private synthetic data outperforms a popular technique when the counts correspond to events arising from subgroups with unequal population sizes or unequal event rates.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2020 program
|