Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 149 - Generating Data for the Public Good While Adhering to Confidentiality and Privacy Restrictions
Type: Invited
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 11:50 AM
Sponsor: Health Policy Statistics Section
Abstract #308141
Title: Risk-Efficient Bayesian Data Synthesis for Privacy Protection
Author(s): Jingchen (Monika) Hu* and Terrance Savitsky and Matthew Williams
Companies: Vassar College and Bureau of Labor Statistics and National Center for Science and Engineering Statistics, National Science Foundation
Keywords: Data privacy protection; Identification risks; Pairwise; Pseudo posterior; Synthetic data

High-utility and low-risks synthetic data facilitates microdata dissemination by statistical agencies. In a previous work, we induced privacy protection into any Bayesian data synthesis model by employing a pseudo posterior likelihood that exponentiates each contribution by an observation record-indexed weight [0, 1], defined to be inversely proportional to the marginal identification risk for that record. This work constructs a weight for each record from a collection of pairwise identification risk probabilities with other records, where each pairwise probability measures the joint probability of re-identification of the pair of records. The by-record weights constructed from the pairwise identification risk probabilities tie together the identification risk probabilities across the data records and compresses the distribution of by-record risks, which produces a more efficient set of synthetic data with lower risk and higher utility. We illustrate our method with an application to the Consumer Expenditure Surveys of the U.S. Bureau of Labor Statistics.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program