Activity Number:
|
239
- Synthetic Data and Differential Privacy: Data, Privacy and the Public Good
|
Type:
|
Invited
|
Date/Time:
|
Tuesday, August 4, 2020 : 1:00 PM to 2:50 PM
|
Sponsor:
|
Survey Research Methods Section
|
Abstract #308102
|
|
Title:
|
Bayesian Pseudo Posterior Mechanism Under Differential Privacy
|
Author(s):
|
Terrance Savitsky and Matthew Williams* and Jingchen (Monika) Hu
|
Companies:
|
Bureau of Labor Statistics and National Center for Science and Engineering Statistics, National Science Foundation and Vassar College
|
Keywords:
|
Differential privacy;
Pseudo posterior;
Pseudo posterior mechanism;
Synthetic data
|
Abstract:
|
We propose a Bayesian pseudo posterior mechanism to generate record-level synthetic datasets with a Differential privacy (DP) guarantee from any proposed synthesizer model. The pseudo posterior mechanism employs a data record-indexed, risk-based weight vector with weights to surgically downweight high-risk records for the generation and release of record-level synthetic data. The pseudo posterior synthesizer constructs weights using Lipschitz bounds for the log-likelihood for each data record, which provides a practical, general formulation for using weights based on record-level sensitivities that we show achieves dramatic improvements in the DP guarantee as compared to the unweighted, non-private synthesizer. We demonstrate using the Consumer Expenditure Surveys (CE) dataset for family income, published by the U.S. Bureau of Labor Statistics. We show that utility is better preserved for our pseudo posterior mechanism as compared to the exponential mechanism (EM) estimated on the same non-private synthesizer.
|
Authors who are presenting talks have a * after their name.