Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 239 - Synthetic Data and Differential Privacy: Data, Privacy and the Public Good
Type: Invited
Date/Time: Tuesday, August 4, 2020 : 1:00 PM to 2:50 PM
Sponsor: Survey Research Methods Section
Abstract #308102
Title: Bayesian Pseudo Posterior Mechanism Under Differential Privacy
Author(s): Terrance Savitsky and Matthew Williams* and Jingchen (Monika) Hu
Companies: Bureau of Labor Statistics and National Center for Science and Engineering Statistics, National Science Foundation and Vassar College
Keywords: Differential privacy; Pseudo posterior; Pseudo posterior mechanism; Synthetic data
Abstract:

We propose a Bayesian pseudo posterior mechanism to generate record-level synthetic datasets with a Differential privacy (DP) guarantee from any proposed synthesizer model. The pseudo posterior mechanism employs a data record-indexed, risk-based weight vector with weights to surgically downweight high-risk records for the generation and release of record-level synthetic data. The pseudo posterior synthesizer constructs weights using Lipschitz bounds for the log-likelihood for each data record, which provides a practical, general formulation for using weights based on record-level sensitivities that we show achieves dramatic improvements in the DP guarantee as compared to the unweighted, non-private synthesizer. We demonstrate using the Consumer Expenditure Surveys (CE) dataset for family income, published by the U.S. Bureau of Labor Statistics. We show that utility is better preserved for our pseudo posterior mechanism as compared to the exponential mechanism (EM) estimated on the same non-private synthesizer.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program