Activity Number:
|
149
- Generating Data for the Public Good While Adhering to Confidentiality and Privacy Restrictions
|
Type:
|
Invited
|
Date/Time:
|
Tuesday, August 4, 2020 : 10:00 AM to 11:50 AM
|
Sponsor:
|
Health Policy Statistics Section
|
Abstract #308141
|
|
Title:
|
Risk-Efficient Bayesian Data Synthesis for Privacy Protection
|
Author(s):
|
Jingchen (Monika) Hu* and Terrance Savitsky and Matthew Williams
|
Companies:
|
Vassar College and Bureau of Labor Statistics and National Center for Science and Engineering Statistics, National Science Foundation
|
Keywords:
|
Data privacy protection;
Identification risks;
Pairwise;
Pseudo posterior;
Synthetic data
|
Abstract:
|
High-utility and low-risks synthetic data facilitates microdata dissemination by statistical agencies. In a previous work, we induced privacy protection into any Bayesian data synthesis model by employing a pseudo posterior likelihood that exponentiates each contribution by an observation record-indexed weight [0, 1], defined to be inversely proportional to the marginal identification risk for that record. This work constructs a weight for each record from a collection of pairwise identification risk probabilities with other records, where each pairwise probability measures the joint probability of re-identification of the pair of records. The by-record weights constructed from the pairwise identification risk probabilities tie together the identification risk probabilities across the data records and compresses the distribution of by-record risks, which produces a more efficient set of synthetic data with lower risk and higher utility. We illustrate our method with an application to the Consumer Expenditure Surveys of the U.S. Bureau of Labor Statistics.
|
Authors who are presenting talks have a * after their name.