Online Program Home
My Program

Abstract Details

Activity Number: 292 - Providing Access to Useful Data While Preserving Confidentiality
Type: Topic Contributed
Date/Time: Tuesday, July 30, 2019 : 8:30 AM to 10:20 AM
Sponsor: Survey Research Methods Section
Abstract #304884 Presentation
Title: Bayesian Pseudo Posterior Synthesis for Data Privacy Protection
Author(s): Jingchen Hu* and Terrance Savitsky and Matthew Williams
Companies: Vassar College and Bureau of Labor Statistics and National Science Foundation
Keywords: Bayesian hierarchical models; Data privacy protection; Identification risks; Pseudo posterior; Synthetic data

Statistical agencies utilize models to synthesize respondent-level data for release to the general public as an alternative to the actual data records. A Bayesian model synthesizer encodes privacy protection by employing a hierarchical prior construction that induces smoothing of the real data distribution. Synthetic respondent-level data records are often preferred to summary data tables due to the many possible uses by researchers and data analysts. Agencies balance a trade-off between utility of the synthetic data versus disclosure risks and hold a specific target threshold for disclosure risk before releasing synthetic datasets. We introduce a pseudo posterior likelihood that exponentiates each contribution by an observation record-indexed weight ? (0, 1), defined to be inversely proportional to the disclosure risk for that record in the synthetic data. Our use of a vector of weights allows more precise downweighting of high risk records in a fashion that better preserves utility as compared with using a scalar weight. We illustrate our method with a simulation study and an application to the Consumer Expenditure Survey of the U.S. Bureau of Labor Statistics.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program