Online Program Home
My Program

Abstract Details

Activity Number: 292 - Providing Access to Useful Data While Preserving Confidentiality
Type: Topic Contributed
Date/Time: Tuesday, July 30, 2019 : 8:30 AM to 10:20 AM
Sponsor: Survey Research Methods Section
Abstract #305026
Title: PMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity
Author(s): Joshua Snoke* and Aleksandra Slavkovic
Companies: RAND Corporation and Penn State University
Keywords: Synthetic Data; Differential Privacy; Propensity Score; CART; Machine Learning; Confidentiality

We propose a method for the release of differentially private synthetic datasets. In many contexts, data contain sensitive values which cannot be released in their original form in order to protect individuals’ privacy. Synthetic data is a protection method that releases alternative values in place of the original ones, and differential privacy (DP) is a formal guarantee for quantifying the privacy loss. We propose a method that maximizes the distributional similarity of the synthetic data relative to the original data using a measure known as the pMSE, while guaranteeing ?-DP. We relax common DP assumptions concerning the distribution and boundedness of the original data. We prove theoretical results for the privacy guarantee and provide simulations for the empirical failure rate of the theoretical results under typical computational limitations. We give simulations for the accuracy of linear regression coefficients generated from the synthetic data compared with the accuracy of non-DP synthetic data and other DP methods. Additionally, our theoretical results extend a prior result for the sensitivity of the Gini Index to include continuous predictors.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program