Activity Number:
|
292
- Providing Access to Useful Data While Preserving Confidentiality
|
Type:
|
Topic Contributed
|
Date/Time:
|
Tuesday, July 30, 2019 : 8:30 AM to 10:20 AM
|
Sponsor:
|
Survey Research Methods Section
|
Abstract #304884
|
Presentation
|
Title:
|
Bayesian Pseudo Posterior Synthesis for Data Privacy Protection
|
Author(s):
|
Jingchen Hu* and Terrance Savitsky and Matthew Williams
|
Companies:
|
Vassar College and Bureau of Labor Statistics and National Science Foundation
|
Keywords:
|
Bayesian hierarchical models;
Data privacy protection;
Identification risks;
Pseudo posterior;
Synthetic data
|
Abstract:
|
Statistical agencies utilize models to synthesize respondent-level data for release to the general public as an alternative to the actual data records. A Bayesian model synthesizer encodes privacy protection by employing a hierarchical prior construction that induces smoothing of the real data distribution. Synthetic respondent-level data records are often preferred to summary data tables due to the many possible uses by researchers and data analysts. Agencies balance a trade-off between utility of the synthetic data versus disclosure risks and hold a specific target threshold for disclosure risk before releasing synthetic datasets. We introduce a pseudo posterior likelihood that exponentiates each contribution by an observation record-indexed weight ? (0, 1), defined to be inversely proportional to the disclosure risk for that record in the synthetic data. Our use of a vector of weights allows more precise downweighting of high risk records in a fashion that better preserves utility as compared with using a scalar weight. We illustrate our method with a simulation study and an application to the Consumer Expenditure Survey of the U.S. Bureau of Labor Statistics.
|
Authors who are presenting talks have a * after their name.